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TTTT. TC OF INVENTION 
pTCH MOLE^tTIAR WEIGHT SURFACE PROTEINS 
OP NOK-TYPEABLB HAEMOPHILUS 

FIKTXl OF INVENTION 
This invention relates to high molecular weight 
proteins of non-typeable haemophilus • 



BACKGROU ND TO THE INVENTION 
Non-typeable Haemophilus Anf jUien^fte are non- 
encapsulated organisms that are defined by their lack of 

10 reactivity with antisera against known fi. influenzae 

capsular antigens. 

These organisms commonly inhabit the upper 
respiratory tract of humans and are frequently 
responsible for a variety of common mucosal surface 
15 infections, such as otitis media, sinusitis, 
conjunctivitis, chronic bronchitis and pneumonia. Otitis 
media remains an important health problem for children 
and most children have had at least one episode of otitis 
by their third birthday and approximately one-third of 
20 children have had three or more episodes. Non-typeable 
Haemophilus influenzae generally accounts for about 20 to 
25% of acute otitis media and for a larger percentage of 
cases of chronic otitis media with effusion. 

A critical first step in the pathogenesis of these 
25 infections is colonization of the respiratory tract 
mucosa. Bacterial surface molecules which mediate 
adherence, therefore, are of particular interest as 
possible vaccine candidates. 

Since the non-typeable organisms do not have a 
30 polysaccharide capsule, they are not controlled by the 
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acid molecule coding £or a high 
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A DNA sequence according to (c) may be one having at 
least about 90% identity of sequence to the DNA sequences 
(a) or (b). 

The inventor has further found correct processing of 
5 the HMW protein requires the presence of additional 
downstream nucleic acid sequences. Accordingly, a 
further aspect of the present invention provides an 
isolated and purified gene cluster comprising a first 
nucleotide sequence encoding a high molecular weight 
10 protein of a non-typeable paempphj-lus strain and at least 
one downstream nucleotide sequence for effecting 
expression of a gene product of the first nucleotide 
sequence fully encoded by the structural gene. 

The gene cluster may comprise a DNA sequence 
15 encoding high molecular weight protein HMWi or HMW2 and 
two downstream accessory genes. The gene cluster may 
have the DNA sequence shown in Figure 6 (SEQ ID No: 5) or 
Figure 7 (SEQ ID No. 6) . 

in an additional aspect, the present invention 
20 includes a vector adapted for transformation of a host, 
comprising a nucleic acid molecule as provided herein, 
particularly the gene cluster provided herein. The 
vector may be an expression vector or a plasmid adapted 
for expression of the encoded high molecular weight 
25 protein, fragments or analogs thereof, in a heterologous 
or homologous host and comprising expression means 
operatives coupled to the nucleic acid molecule. The 
expression means may include a nucleic acid .portion 
encoding a leader sequence for secretion from the host of 
30 the high molecular weight protein. The expression means 
may include a nucleic acid portion encoding a lipidation 
signal for expression from the host of a lipidated form 
of the high molecular weight protein. The host may be 

selected from, for example, &, coli, EasiliHS, 

35 Haejso^hllus., fungi, yeast, baculovirus and Semliki Forest 
Virus expression systems. The invention further includes 
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a recombinant high molecular weight protein of non- 
typeable Haemophilus or fragment or analog thereof 
producible by the transformed host. 

In another aspect , the invention provides an 
5 isolated and purified high molecular weight protein of 
non-typeable Haemophilus influenzae which is encoded by 
a nucleic acid molecule as provided herein. Such high 
molecular weight proteins may be produced recombinantly 
to be devoid of non-high molecular weight proteins of 
10 non-typeable Haemophilus influenzae or from natural 
sources . 

Such protein may be characterized by at least one 
surface-exposed B-cell epitope which is recognized by 
monoclonal antibody AD6 (ATCC ) . Such protein may 

15 be HMWl encoded by the DNA sequence shown in Figure 1 
(SEQ ID No: 1) and having the derived amino acid sequence 
of Figure 2 (SEQ ID No: 2) and having an apparent 
molecular weight of 125 kDa. Such protein may be HMW2 
encoded by the DNA sequence shown in Figure 3 (SEQ ID No: 

20 3) and having the derived amino acid sequence of Figure 
4 (SEQ ID No: 4) and having an apparent molecular weight 
of 120 kDA. Such protein may be HMW3 encoded by the DNA 
sequence shown in Figure 8 (SEQ ID No: 7) and having the 
derived amino acid sequence of Figure 10 (SEQ ID No: 9) 

25 and having an apparent molecular weight of 125 kDa. Such 
protein may be HMW4 encoded by the DNA sequence shown in 
Figure 9 (SEQ ID No: 8) and having the derived amino acid 
sequence shown in Figure 10 (SEQ ID No: xo) and having 
the apparent molecular weight of l23kDa. 

30 A further aspect of the invention provides an 

isolated and purified high molecular weight protein of 
non-typeable Haemophilus influenzae which is 
antigenically related to the filamentous hemagglutinin 
surface protein of Bordetella pertussis , particularly 

35 HMWl, HMW2 , HMW3 or HMW4. 
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The immunogenic compositions of the invention (including 
vaccines) may further comprise at least one other 
immunogenic or immunostimulating material and the 
immunostimulating material may be at least one adjuvant. 
5 Suitable adjuvants for use in the present invention 

include, (but are not limited to) aluminum phosphate, 
aluminum hydroxide, QS21, Quil A, derivatives and 
components thereof, ISCOM matrix, calcium phosphate, 
calcium hydroxide, zinc hydroxide, a glycolipid analog, 

10 an octadecyl ester of an amino acid, a muramyl dipeptide 
polyphosphazare, ISCOPRP, DC-chol, DDBA and a lipoprotein 
and other adjuvants to induce a Thl response. 
Advantageous combinations of adjuvants are described in 
copending United States patent Application Serial No. 

15 08/261,194 filed June 16, 1994, assigned to Connaught 
Laboratories Limited and the disclosure of which is 
incorporated herein by reference. 

In a further aspect of the invention, there is 
provided a method of generating an immune response in a 

20 host, comprising administering thereto an immuno- 
effective amount of the immunogenic composition as 
provided herein. The immune response may be a humoral or 
a cell-mediated immune response. Hosts in which 
protection against disease may be conferred include 

25 primates including humans* 

The present invention additionally provides a method 
of producing antibodies specific for a high molecular 
weight protein of non-typeable Haemophi lus influenzae , 
comprising: 

30 (a) administering the high molecular weight protein 

or epitope containing peptide provided herein to at least 
one mouse to produce at least one immunized mouse; 

(b) removing B-lymphocytes from the at least one 
immunized mouse; 
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(c) fusing the B- lymphocytes from the at least one 
immunized mouse with myeloma cells, thereby producing 
hybridomas ; 

(d) cloning the hybridomas; 

5 (e) selecting clones which produce anti-high 

molecular weight protein antibody; 

(f) culturing the anti-high molecular weight 
protein antibody-producing clones; and then 

(g) isolating anti-high molecular weight protein 
10 antibodies from the cultures . 

Additional aspects of the present invention include 
monoclonal antibody AD6 and monoclonal antibody iocs. 

The present invention provides, in an additional 
aspect thereof, a method for producing an immunogenic 

15 composition, comprising administering the immunogenic 
composition provided herein to a first test host to 
determine an amount and a frequency of administration 
thereof to elicit a selected immune response against a 
high molecular weight protein of non-typeable Haemonhiiw^ 

20 influenzae; and formulating the immunogenic composition 
in a form suitable for administration to a second host in 
accordance with the determined amount and frequency of 
administration. The second host may be a human. 

The novel envelope protein provided herein is useful 

25 in diagnostic procedures and kits for detecting 
antibodies to high molecular weight proteins of non- 
typeable Haemophilus influenzae. Further monoclonal 
antibodies specific for the high molecular protein or 
epitopes thereof are useful in diagnostic procedure and 

30 kits for detecting the presence of the high molecular 
weight protein. 

Accordingly, a further aspect of the invention 
provides a method of determining the presence in a 
sample, of antibodies specifically reactive with a high 

35 molecular weight protein of Haemophilus infln P n^ P 
comprising the steps of: 



WO 97/36914 



PCT/US97/O4707 



9 

(a) contacting the sample with the high molecular 
weight protein or epitope-containing peptide as 
provided herein to produce complexes comprising the 
protein and any said antibodies present in the 

5 sample specifically reactive therewith; and 

(b) determining production of the complexes* 

In a further aspect of the invention, there is 
provided a method of determining the presence, in a 
sample, of a high molecular weight protein of Haemophilus 
10 influenzae or an epitope-containing peptide, comprising 
the steps of: 

(a) immunizing a host with the protein or peptide 
as provided herein, to produce antibodies specific 
for the protein or peptide; 

15 (b) contacting the sample with the antibodies to 

produce complexes comprising any high molecular 
weight protein or epitope-containing peptide present 
in the sample and said specific antibodies; and 

(c) determining production of the complexes. 

20 A further aspect of the invention provides a 

diagnostic kit for determining the presence of antibodies 
in a sample specifically reactive with a high molecular 
weight protein of non-typeable Haemophilus influenzae or 
epitope-containing peptide, comprising: 

25 (a) the high molecular weight protein or epitope- 

containing peptide as provided herein; 

(b) means for contacting the protein or peptide 
with the sample to produce complexes comprising the 
protein or peptide and any said antibodies present 

30 in the sample; and 

(c) means for determining production of the 
complexes. 

The invention also provides a diagnostic kit for 
detecting the presence, in a sample, of a high molecular 
35 weight protein of Haemophilus influenzae or epitope- 
containing peptide, comprising: 
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(a) an antibody specific for the novel envelope 
protein as provided herein; 

(b) means for contacting the antibody with the 
sample to produce a complex comprising the protein 

5 or peptide and protein-specific antibody; and 

(c) means for determining production of the 
complex. 

In this application, the term "high molecular weight 
protein" is used to define a family of high molecular 
10 weight proteins of Haemophilus influent r generally 
having an apparent molecular weight of from about 120 to 
about 130 kDa and includes proteins having variations in 
their amino acid sequences. In this application, a first 
protein or peptide is a "functional analog" of a second 
15 protein or peptide if the first protein or peptide is 
immunologically related to and/or has the same function 
as the second protein or peptide. The functional analog 
may be, for example, a fragment of the protein or a 
substitution, addition or deletion mutant thereof. The 
20 invention also extends to such functional analogs. 

Advantages of the present invention include: 
an isolated and purified envelope high molecular 
weight protein of HaemophUws influent produced 
recombinantly to be devoid of non-high molecular weight 
25 proteins of Haemophilus influent or from natural 
sources as well as nucleic acid molecules encoding the 
same; 

high molecular weight protein specific human 
monoclonal antibodies which recognize conserved epitopes 
30 in such protein; and 

diagnostic kits and immunological reagents for 
specific identification of hosts infected by Haemophilus 
influenzae. 



W097A36914 

PCT/US97/04707 



20 



25 



30 



11 



BRIEF DESOPTPTTm i OF p^ yjjKr.f; 
Figures 1A to 1G contain the DNA sequence of a gene 
coding for protein HMWl (SEQ id No: 1, . The h^ open 
reading frame extends from nucleotides 351 to 49 58 . 

Figures 2A and 2B contain the derived amino 'acid 
sequence of protein HMWl (SEQ ID No: 2) ; 

Figures 3A to 3G contain the DNA sequence of a gene 
coding for protein HMW2 (SEQ ID No: 3) . The open hmw2A 
open reading frame extends from nucleotides 382 to 4^ 
Figures 4A and 4B contain the derived amino acid 
sequence of HMW2 (SEQ ID No: 4) ; 

Figure 5A shows restriction maps of representative 
recombinant phages which contained the HMWl or HMW2 
structural genes and of HMWl piasmid subclones. The 
shaded boxes indicate the location of the structural 
genes In the recombinant phage, transcription proceeds 
from left to right for the HMWl gene and from right to 
left for the HMW2 gene; 

Figure SB shows the restriction map of the T7 
expression vector P T7-7. This vector contains the T7 RNA 
polymerase promoter *io, a ribosoaal binding site (rbs , 
and the translational start site for the T7 gene 10 
protein upstream from a multiple cloning site; 

Figures 6A to 6L contain the DNA sequence' of a gene 
cluster for the hjrw! gene (SEQ id NO: 5, , comprising 
nucleotides 35 i to 4958 (ORP a) (as in Figure 1, , as well 
as two additional downstream genes in the 3' flanking 
region, comprising ORFs hl nucleotides 5li 4 to 6748 ^ 
c nucleotides 7062 to 9011; 

dust^T" I* t0 71 C ° ntaln *** DHA <* a gene 

duster for the gene ,s«s 1D HO: 6), comprising 

nucleotides 792 to 5222 (orp a) (as ^ Pigure * 
as two .oditiona! downstrea, genes in the 3' flankl ^g 
region, eoBprising oBPs fc, nucleotides 5375 to 7009, ana 
35 £, nucleotides 7249 to 9198; 



WO 97/36914 



PCT/US97/04707 



12 



Figures 8A and 8B contain the DNA sequence of a gene 
coding for protein HMW3 (SEQ ID NO: 7); 

Figures 9A and 9B contain the DNA sequence of a gene 
coding for protein HMW4 (SEQ ID NO: 8) ; 
5 Figures ioa to iol contain a comparison table for 

the derived amino acid sequence for proteins HMWl (SEQ ID 
No: 2), HMW2 (SEQ ID No: 4), HMW3 (SEQ ID No: 9) and HMW4 
(SEQ ID No: 10) ; 

Figure 11 illustrates a Western inununoblot assay of 
10 phage lysates containing either the HMWl or HMW2 
recombinant proteins. Lysates were probed with an fU. 
coli-absorbed adult serum sample with high-titer antibody 
against high molecular weight proteins. The arrows 
indicate the major immunoreactive bands of 125 and 120 
15 kDa in the HMWl and HMW2 lysates respectively; 

Figure 12 is a Western immunoblot assay of cell 
sonicates prepared from E, cpU transformed with plasmid 
PT7-7 (lanes l and 2), pHMWl-2 (lanes 3 and 4), pHMWl-4 
(lanes 5 and 6) or pHMWl-14 (lanes 7 and 8). The 
20 sonicates were probed with an E^jsali-absorbed adult 
serum sample with high-titer antibody against high- 
molecular weight proteins. Lanes labelled U and I 
sequence sonicates prepared before and after indication 
of the growing samples with IPTG, respectively. The 
25 arrows indicate protein bands of interest as discussed 
below; 

Figure 13 is a graphical illustration of an ELISA 
with rHMWl antiserum assayed against purified filamentous 
haemagglutinin of B. pertussis. Ab - antibody; 

30 Figure 14 is a Western immunoblot assay of cell 

sonicates from a panel of epidemiologically unrelated 
non-typeable H. influenzae strains. The sonicates were 
probed with rabbit antiserum prepared against HMW1-4 
recombinant protein. The strain designations are 

35 indicated by the numbers below each line; 
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Figure 15 is a Western immunoblot assay of cell 
sonicates from a panel of epidemiologically unrelated 
non-typeable H. influenzae strains. The sonicates were 
probed with monoclonal antibody X3C, a murine lgG 
5 antibody which recognizes the filamentous hemagglutinin 
of B. pertussis . The strain designations are indicated 
by the numbers below each line; 

Figure 16 shows an immunoblot assay of cell 

sonicates of non-typeable fi, influent strain 12 

10 derivatives. The sonicates were probed with rabbit 
antiserum prepared against HMW-l recombinant protein. 
Lanes: 1, wild-type strain; 2, HMW2' mutant; 3, HMWr 
mutant; 4. HMW1* HMW2' double mutant; 

Figure 17 shows middle ear bacterial counts in PBS- 
15 immunized control animals (left panel) and HMW1/HMW2- 
immunized animals (right panel) seven days after middle 
ear inoculation with non-typeable Haemophilus influenzae 
strain 12. Data are log-transformed and the horizontal 
lanes indicate the means and standard deviations of 
20 middle ear fluid bacterial counts for only the infected 
animals in each group; 

Figure 18 is a schematic diagram of pGBMEX» -hmwl 
recombinant plasmids. The restriction enzymes are B- 
BamHl, E-EcoRi, C-Clai, RV-£coRV, Bst-£s£EII and H- 
25 Hindlll; 

Figure 19 is a schematic diagram of pGEMEX»-hmw2 
recombinant plasmids. The restriction enzymes are E- 
£S2RI, H-Bindjii, Hc-HJjjsII, M-Mlul and X-ffiQi; 

Figure 20 is an immunoelectron micrograph of 
30 representative non-typeable Haemophilus influent 
strains after incubation with monoclonal antibody AD6 
followed by incubation with goat anti-mouse lgG 
conjugated with 10-na colloidal gold particles. Strains 
are: upper left panel-strain 12; upper right panel-strain 
35 12 mutant deficient in expression of the high molecular 
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weight proteins; lower left panel-strain 5; lower right 
panel-strain 15; 

Figure 21 is a Western immunoblot assay with Mab AD6 
and HMWl or HMW2 recombinant proteins. The upper left 
5 panel indicates the segments of hmwjA or hmw2ft structural 
genes which are being expressed in the recombinant 
proteins. The lane numbers correspond to the indicated 
segments ; 

Figure 22 is a Western immunoblot assay with MAb 
10 10C5 and HMWl or HMW2 recombinant proteins. The upper 
panel indicates the segments of the hrowlA or hmwZA 
structural genes which are being expressed in the 
recombinant proteins. The lane numbers correspond to the 
indicated segments; and 
15 Figure 23 is a Western immunoblot assay with MAb AD6 

and a panel of unrelated non-typeable Haemophilus 
influenzae strains which express HMW1/HMW-2 like protein. 
Cell sonicates were prepared from freshly grown samples 
of each strain prior to analysis in the Western blot. 

20 

gRNKRAT, DES CRIPTION OF INVENTION 
The DNA sequences of the genes coding for the HMWl 
and HKW2 proteins of non-typeable Haomonhilus influenzae 
strain 12, shown in Figures 1 and 3 respectively, were 

25 shown to be about 80% identical, with the first 1259 base 
pairs of the genes being identical. The open reading 
frame extend from nucleotides 351 to 4958 and from 
nucleotide 382 to 4782 respectively. The derived amino 
acid sequences of the two HMW proteins, shown in Figures 

30 2 and 4 respectively, are about 70% identical. 
Furthermore, the encoded proteins are antigenically 
related to the filamentous hemagglutinin surface protein 
of pardetella pertussis . A monoclonal antibody prepared 
against filamentous hemagglutinin (FHA) of pordete».a 

35 pertussis was found to recognize both of the high 
molecular weight proteins. This data suggests that the 
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HMW and FHA proteins may serve similar biological 
functions. The derived amino acid sequences of the HMWi 
and HMW2 proteins show sequence similarity to that for 
the FHA protein. It has further been shown that these 
antigenically-related proteins are produced by the 
majority of the non-typeable strains of Haemophilus . 
Antisera raised against the protein expressed by the HMWI 
gene recognizes both the HMW2 protein and the JB^ 
pertussis FHA. The present invention includes an 
isolated and purified high molecular weight protein of 
non-typeable haemophilus which is antigenically related 
to the B. pertussis FHA and which may be obtained from 
natural sources or produced recombinantly . 

A phage genomic library of a known strain of 
non-typeable Haemophilus was prepared by standard methods 
and the library was screened for clones expressing high 
molecular weight proteins , using a high titre antiserum 
against HMW's. A number of strongly reactive DMA clones 
were plaque-purified and sub-cloned into a T7 expression 
plasmid. It was found that they all expressed either one 
or the other of the two high-molecular-weight proteins 
designated HMWI and HMW2, with apparent molecular weights 
of 125 and 120 kDa r respectively, encoded by open reading 
frames of 4-6 kb and 4.4 kb, respectively* 

Representative clones expressing either HMWI or HMW2 
were further characterized and the genes isolated, 
purified and sequenced. The DNA sequence of HMWI is 
shown in Figure l and the corresponding derived amino 
acid sequence in Figure 2. Similarly, the DNA sequence of 
HMW2 is shown in Figure 3 and the corresponding derived 
amino acid sequence in Figure 4. Partial purification of 
the isolated proteins and N-terminal sequence analysis 
indicated that the expressed proteins are truncated since 
their sequence starts at residue number 442 of both full 
length HMWI and HMW2 gene products. 
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Subcloning studies with respect to the hmwl and hmw2 
genes indicated that correct processing of the HMW 
proteins required the products of additional downstream 
genes* It has been found that both the hmwl and hmw2 
5 genes are flanked by two additional downstream open 
reading frames (ORPs) , designated h and c, respectively , 
(see Figures 6 and 7) . 

The £ ORFs are 1635 bp in length, extending from 
nucleotides 5114 to 6748 in the case of hmwl and 

10 nucleotides 5375 to 7009 in the case of hmw2 , with their 
derived amino acid sequences being 99% identical. The 
derived amino acid sequences demonstrate similarity with 
the derived amino acid sequences of two genes which 
encode proteins required for secretion and activation of 

15 hemolysins of p. mjrabjlUs and S. marcescens. 

The c ORPs are 1950 bp in length, extending from 
nucleotides 7062 to 9011 in the case of hmwl and 
nucleotides 7249 to 9198 in the case of hmw2 . with their 
derived amino acid sequences 96% identical. The hmwl £ 

20 ORF is preceded by a series of 9 bp direct tandem 
repeats. In plasmid subclones, interruption of the hmwl 
b or c ORF results in defective processing and secretion 
of the ftmwl structural gene product. 

The two high molecular weight proteins HMW1 and HMW2 

25 have been isolated and purified by the procedures 
described below in the Examples and shown to be 
protective against otitis media in chinchillas and to 
function as adhesins. These results indicate the 
potential for use of such high molecular proteins and 

30 structurally-related proteins of other non-typeable 
strains of Haemophilus influenzae as components in 
immunogenic compositions for protecting a susceptible 
host, such as a human infant, against disease caused by 
infection with non-typeable Haemophilus influenzae . 

35 Since the proteins provided herein are good 

cross-reactive antigens and are present in the majority 
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of non-typeable Haemophilus strains, it is evident that 
these HMW proteins may become integral constituents of a 
universal Haemophilus vaccine* Indeed, these proteins 
may be used not only as protective antigens against 
5 otitis, sinusitis and bronchitis caused by the 
non-typeable Haemophilus strains, but also may be used as 
carriers for the protective Hib polysaccharides in a 
conjugate vaccine against meningitis. The proteins also 
may be used as carriers for other antigens, haptens and 
10 polysaccharides from other organisms, so as to induce 
immunity to such antigens, haptens and polysaccharides. 

The nucleotide sequences encoding two high molecular 
weight proteins of a different non-typeable Haemophilus 
strain (designated HMW3 and HMW4) , namely strain 5 have 
15 been elucidated, and are presented in Figures 8 and 9 
(SEQ ID Nos: 7 and 8) . HMW3 has an apparent molecular 
weight of 125 kDa while HMW4 has an apparent molecular 
weight of 123 kDa. These high molecular weight proteins 
are antigenically related to the HMW1 and HMW2 proteins 
20 and to FHA* Figure 10 contains a multiple sequence 
comparison of the derived amino acid sequences for the 
four high molecular weight proteins identified herein 
(HMWl, SEQ ID No: 2; HMW2, SEQ ID No: 4; HMW3, SEQ ID No: 
9; HMW4, SEQ ID No. 10). As may be seen from this 
25 comparison, stretches of identical amino acid sequence 
may be found throughout the length of the comparison, 
with HMW3 more closely resembling HMWl and HMW4 more 
closely resembling HMW2. This information is highly 
suggestive of a considerable sequence homology between 
30 high molecular weight proteins from various non-typeable 
Haemophilus strains. This information is also suggestive 
that the HMW3 and HMW4 proteins will have the same 
immunological properties as the HMWl and HMW2 proteins 
and that corresponding HMW proteins from other non- 
35 typeable Haemophilus strains will have the same 
immunological properties as the HMWl and HMW2 proteins* 
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In addition, mutants of non-typeable H. influenzae 
strains that are deficient in expression of HMWl or HMW2 
or both have been constructed and examined for their 
capacity to adhere to cultured human epithelial cells. 
5 The hmwl and hmw2 gene clusters have been expressed in E. 
coli and have been examined for in vitro adherence. The 
results of such experimentation, described below, 
demonstrate that both HMWl and HHW2 mediate attachment 
and hence are adhesins and that this function is present 

10 even in the absence of other H. influenzae surface 
structures. The ability of a bacterial surface protein 
to function as an adhesin provides strong in vitro 
evidence for its potential role as a protective antigen. 
In view of the considerable sequence homology between the 

15 HMW3 and HMW4 proteins and the HMWl and HMW2 proteins, 
these results indicate that HMW3 and HMW4 also are likely 
to function as adhesins and that other HMW proteins of 
other strains of non-typeable Haemophilus influenzae 
similarly are likely to function as adhesins. This 

20 expectation is borne out by the results described in the 
Examples below. 

With the isolation and purification of the high 
molecular weight proteins, the inventor is able to 
determine the major protective epitopes of the proteins 

25 by conventional epitope mapping and synthesizing peptides 
corresponding to these determinants for incorporation 
into fully synthetic or recombinant vaccines. 
Accordingly, the invention also comprises a synthetic 
peptide having at least six and no more than 150 amino 

30 acids and having an amino acid sequence corresponding to 
at least one protective epitope of a high molecular 
weight protein of a non-typeable Haemophilus influenzae. 
Such peptides are of varying length that constitute 
portions of the high molecular weight proteins, that can 

35 be used to induce immunity, either directly or as part of 
a conjugate, against the respective organisms and thus 
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constitute active components of immunogenic compositions 
for protection against the corresponding diseases* 

In particular, the applicant has sought to identify 
regions of the high molecular weight proteins which are 
demonstrated experimentally to be surface-exposed B-cell 
epitopes and which are common to all or at least a large 
number of non-typeable strains of Haemophilus influenzae - 
The strategy which has been adopted by the inventor has 
been to: 

(a) generate a panel of monoclonal antibodies 
reactive with the high molecular weight proteins; 

(b) screen those monoclonal antibodies for 
reactivity with surface epitopes of intact bacteria 
using immunoelectron microscopy or other suitable 
screening technique; 

(c) map the epitopes recognized by the monoclonal 
antibody by determining the reactivity of the 
monoclonals with a panel of recombinant fusion 
proteins; and 

(d) determining the reactivity of the monoclonal 
antibodies with heterologous non-typable Haemophilus 
influenzae strains using standard Western blot 
assays. 

Using this approach, the inventor has identified one 

monoclonal antibody, designated AD6 (ATCC ) , which 

recognized a surface-exposed B-cell epitope common to all 
non-typeable Bt inffmfflSfte which express the HMW1 and 
HMW2 proteins. The epitope recognized by this antibody 
was mapped to a 75 amino acid sequence at the carboxy 
termini of both HMW1 and HMW2 proteins . The ability to 
identify shared surface-exposed epitopes on the high 
molecular weight adhesion proteins suggests that it would 
be possible to develop recombinant or synthetic peptide 
based vaccines which would be protective against disease 
caused by the majority of non-typeable Haemophilus 
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The present invention also provides any variant or 
fragment of the proteins that retains the potential 
immunological ability to protect against disease caused 
by non-typeable Haemophilus strains. The variants may be 
5 constructed by partial deletions or mutations of the 
genes and expression of the resulting modified genes to 
give the protein variants. 

It is clearly apparent to one skilled in the art, 
that the various embodiments of the present invention 
10 have many applications in the fields of vaccination, 
diagnosis, treatment of bacterial infections and the 
generation of immunological reagents. A further non- 
limiting discussion of such uses is further presented 
below. 

15 1. Vaccine Preparation and Use 

Immunogenic compositions, suitable to be used as 
vaccines, may be prepared from the high molecular weight 
proteins of Haemophilus influenzae , as well as analogs 
and fragments thereof, and synthetic peptides containing 
20 epitopes of the protein, as disclosed herein. The 
immunogenic composition elicits an immune response which 
produces antibodies, including anti-high molecular weight 
protein antibodies and antibodies that are opsonizing or 
bactericidal . 

25 Immunogenic compositions, including vaccines, may be 

prepared as injectables, as liquid solutions or 
emulsions. The active component may be mixed with 
pharmaceutical ly acceptable excipients which are 
compatible therewith. Such excipients may include, 

30 water, saline, dextrose, glycerol, ethanol, and 
combinations thereof. The immunogenic compositions and 
vaccines may further contain auxiliary substances, such 
as wetting or emulsifying agents, pH buffering agents, or 
adjuvants to enhance the effectiveness thereof. 

35 immunogenic compositions and vaccines may be administered 
parenterally , by in j ection subcutaneous ly or 
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intramuscularly. Alternatively, the immunogenic 

compositions formed according to the present invention, 
may be formulated and delivered in a manner to evoke an 
immune response at mucosal surfaces. Thus, the 
5 immunogenic composition may be administered to mucosal 
surfaces by, for example, the nasal or oral 
(intragastric) routes. Alternatively, other modes of 
administration including suppositories and oral 
formulations may be desirable. For suppositories, 
10 binders and carriers may include, for example, 
polyalkalene glycols or triglycerides. Oral formulations 
may include normally employed incipients such as, for 
example, pharmaceutical grades of saccharine, cellulose 
and magnesium carbonate. These compositions can take the 
15 form of solutions, suspensions, tablets, pills, capsules, 
sustained release formulations or powders and contain 
about 1 to 95% of the active component. The immunogenic 
preparations and vaccines are administered in a manner 
compatible with the dosage formulation, and in such 
20 amount as will be therapeutically effective, protective 
and immunogenic. The quantity to be administered depends 
on the subject to be treated, including, for example, the 
capacity of the individual's immune system to synthesize 
antibodies, and if needed, to produce a cell-mediated 
25 immune response. Precise amounts of active ingredient 
required to be administered depend on the judgment of the 
practitioner . However , suitable dosage ranges are 
readily determinable by one skilled in the art and may be 
of the order of micrograms of the HMW proteins. Suitable 
30 regimes for initial administration and booster doses are 
also variable, but may include an initial administration 
followed by subsequent administrations. The dosage may 
also depend on the route of administration and will vary 
according to the size of the host. 
*5 The concentration of the active component in an 

immunogenic composition according to the invention is in 
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general about 1 to 95%. A vaccine which contains 
antigenic material of only one pathogen is a monovalent 
vaccine. Vaccines which contain antigenic material of 
several pathogens are combined vaccines and also belong 
5 to the present invention. Such combined vaccines 
contain, for example, material from various pathogens or 
from various strains of the same pathogen, or from 
combinations of various pathogens. 

Immunogenicity can be significantly improved if the 

10 antigens are co-administered with adjuvants, commonly 
used as 0.05 to 0.1 percent solution in phosphate- 
buffered saline. Adjuvants enhance the immunogenicity of 
an antigen but are not necessarily immunogenic 
themselves. Adjuvants may act by retaining the antigen 

15 locally near the site of administration to produce a 
depot effect facilitating a slow, sustained release of 
antigen to cells of the immune system. Adjuvants can 
also attract cells of the immune system to an antigen 
depot and stimulate such cells to elicit immune 

20 responses. 

Immunostimulatory agents or adjuvants have been used 
for many years to improve the host immune responses to, 
for example, vaccines. Intrinsic adjuvants, such as 
lipopolysaccharides, normally are the components of the 

25 killed or attenuated bacteria used as vaccines. 
Extrinsic adjuvants are immunomodulators which are 
typically non-covalently linked to antigens and are 
formulated to enhance the host immune responses. Thus, 
adjuvants have been identified that enhance the immune 

30 response to antigens delivered parenterally. Some of 
these adjuvants are toxic, however, and can cause 
undesirable side-effects, making them unsuitable for use 
in humans and many animals. Indeed, only aluminum 
hydroxide and aluminum phosphate (collectively commonly 

35 referred to as alum) are routinely used as adjuvants in 
human and veterinary vaccines. The efficacy of alum in 
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increasing antibody responses to diphtheria and tetanus 
toxoids is well established and a HBsAg vaccine has been 
adjuvanted with alum. While the usefulness of alum is 
well established for some applications, it has 
5 limitations. For example , alum is ineffective for 
influenza vaccination and inconsistently elicits a cell 
mediated immune response. ' The antibodies elicited by 
alum-ad juvanted antigens are mainly of the IgGl i so type 
in the mouse, which may not be optimal for protection by 

10 some vaccinal agents. 

A wide range of extrinsic adjuvants can provoke 
potent immune responses to antigens. These include 
saponins complexed to membrane protein antigens (immune 
stimulating complexes) , pluronic polymers with mineral 

15 oil, killed mycobacteria in mineral oil, Freund's 
complete adjuvant, bacterial products, such as muramyl 
dipeptide (HDP) and lipopoly saccharide (LPS) , as well as 
lipid A, and liposomes. 

To efficiently induce humoral immune responses (HZR) 

20 and cell -mediated immunity (CMI) , immunogens are often 
emulsified in adjuvants. Many adjuvants are toxic, 
inducing granulomas, acute and chronic inflammations 
(Freund's complete adjuvant, FCA) , cytolysis (saponins 
and Pluronic polymers) and pyrogenicity, arthritis and 

25 anterior uveitis (LPS and HDP) . Although FCA is an 
excellent adjuvant and widely used in research, it is not 
licensed for use in human or veterinary vaccines because 
of its toxicity. 

Desirable characteristics of ideal adjuvants 

30 include: 

(1) lack of toxicity; 

(2) ability to stimulate a long-lasting immune response; 

(3) simplicity of manufacture and stability in long-term 
storage; 

35 (4) ability to elicit both CHI and HIR to antigens 
administered by various routes, if required; 
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(5) synergy with other adjuvants; 

(6) capability of selectively interacting with 
populations of antigen presenting cells (APC) ; 

(7) ability to specifically elicit appropriate T H 1 or 
5 T H 2 cell-specific ipmune responses; and 

(8) ability to selectively increase appropriate antibody 
isotype levels (for example, IgA) against antigens* 

U.S. Patent No. 4, 855 , 283 granted to Lockhoff et al 
on August 8 , 1989 which is incorporated herein by 

10 reference thereto teaches glycolipid analogues including 
N-glycosylamide6, N-glycosylureas and N- 
glycosylcarbamates, each of which is substituted in the 
sugar residue by an amino acid, as immuno-modulators or 
adjuvants. Thus, Lockhoff et al. (US Patent No. 

15 4,855,283 and ref . 29) reported that N-glycolipid analogs 
displaying structural similarities to the naturally- 
occurring glycolipids, such as glycosphingolipids and 
glycoglycerolipids , are capable of eliciting strong 
immune responses in both herpes simplex virus vaccine and 

20 pseudorabies virus vaccine. Some glycolipids have been 
synthesized from long chain-alkylamines and fatty acids 
that are linked directly with the sugars through the 
anomeric carbon atom, to mimic the functions of the 
naturally occurring lipid residues. 

25 U.S. Patent No. 4,258,029 granted to Moloney, 

incorporated herein by reference thereto, teaches that 
octadecyl tyrosine hydrochloride (OTH) functioned as an 
adjuvant when complexed with tetanus toxoid and formalin 
inactivated type I, II and III poliomyelitis virus 

30 vaccine. Also, Nixon-George et al. (ref. 30), reported 
that octadecyl esters of aromatic amino acids complexed 
with a recombinant hepatitis B surface antigen, enhanced 
the host immune responses against hepatitis B virus. 

Lipidation of synthetic peptides has also been used 

35 to increase their immunogenicity. Thus, Wiesmuller 1989, 
describes a peptide with a sequence homologous to a foot- 
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tr ipalmity 1-s-glycery 1-cysteiny Iserylser ine , being a 
synthetic analogue of the N-tenninal part of the 
lipoprotein from Gram negative bacteria. Furthermore , 
5 Deres et al. 1989, reported in vivo priming of virus- 
specific cytotoxic T lymphocytes with synthetic 
lipopeptide vaccine which comprised of modified synthetic 
peptides derived from influenza virus nucleoprotein by 
linkage to a lipopeptide, N-palmityl-s-[2,3- 
10 bis (palmitylxy ) — (2RS) -propyl- [R] -cysteine (TPC) ♦ 
2 . Immunoassays 

The high molecular weight protein of Haemophilus 
influenzae of the present invention is useful as an 
immunogen for the generation of anti-protein antibodies, 
15 as an antigen in immunoassays including enzyme -linked 
immunosorbent assays (ELISA) , RIAs and other non-enzyme 
linked antibody binding assays or procedures known in the 
art for the detection of antibodies* In ELISA assays, 
the protein is immobilized onto a selected surface, for 
20 example, a surface capable of binding proteins, such as 
the wells of a polystyrene microtiter plate. After 
washing to remove incompletely adsorbed protein, a 
nonspecific protein, such as a solution of bovine serum 
albumin (BSA) that is known to be antigenically neutral 
25 with regard to the test sample f may be bound to the 
selected surface. This allows for blocking of 
nonspecific adsorption sites on the immobilizing surface 
and thus reduces the background caused by nonspecific 
bindings of ant i sera onto the surface * 
30 The immobilizing surface is then contacted with a 

sample, such as clinical or biological materials, to be 
tested in a manner conducive to immune complex 
(antigen/ antibody) formation. This may include diluting 
the sample with diluents, such as solutions of BSA, 
35 bovine gamma globulin (B66) and/or phosphate buffered 
saline (PBS)/Tween. The sample is then allowed to 
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incubate for from about: 2 to 4 hours, at temperatures 
such as of the order of about 25* to 37*C- Following 
incubation, the sample-contacted surface is washed to 
remove non-immunocomplexed material* The washing 
5 procedure may include washing with a solution, such as 
PBS/Tween or a borate buffer. Following formation of 
specific immunocomplexes between the test sample and the 
bound protein, and subsequent washing, the occurrence, 
and even amount, of immunocomplex formation may be 

10 determined by subjecting the immunocomplex to a second 
antibody having specificity for the first antibody. If 
the test sample is of human origin, the second antibody 
is an antibody having specificity for human 
immunoglobulins and in general IgG. To provide detecting 

15 means, the second antibody may have an associated 
activity such as an enzymatic activity that will 
generate, for example, a colour development upon 
incubating with an appropriate chromogenic substrate. 
Quantification may then be achieved by measuring the 

20 degree of colour generation using, for example, a visible 
spectra spectrophotometer* 

3* Use of Sequences as Hybridization Probes 

The nucleotide sequences of the present invention, 
comprising the sequences of the genes encoding the high 

25 molecular weight proteins of specific strains of non- 
typeable Haemophilus influenzae , now allow for the 
identification and cloning of the genes from any species 
of non-typeable Haemophilus and other strains of non- 
typeable Haemophil us influenzae. 

30 The nucleotide sequences comprising the sequences of 

the genes of the present invention are useful for their 
ability to selectively form duplex molecules with 
complementary stretches of other genes of high molecular 
weight proteins of non-^typeable Haemophilus . Depending 

35 on the application, a variety of hybridization conditions 
may be employed to achieve varying degrees of selectivity 
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of the probe toward the other genes. For a high degree 
of selectivity, relatively stringent conditions are used 
to form the duplexes, such as low salt and/or high 
temperature conditions, such as provided by 0.02 M to 
5 0.15 M NaCl at temperatures of between about 50*C to 70*C. 
For some applications, less stringent hybridization 
conditions are required such as 0,15 M to 0.9 M salt, at 
temperatures ranging from between about 20 # C to 55*C. 
Hybridization conditions can also be rendered more 

10 stringent by the addition of increasing amounts of 
formamide, to destabilize the hybrid duplex* Thus, 
particular hybridization conditions can be readily 
manipulated, and will generally be a method of choice 
depending on the desired results. In general, convenient 

15 hybridization temperatures in the presence of 50% 
formamide are: 42*C for a probe which is 95 to 100% 
homologous to the target fragment, 37*C for 90 to 95% 
homology and 32*C for 85 to 90% homology. 

* In a clinical diagnostic embodiment, the nucleic 

20 acid sequences of the genes of the present invention may 
be used in combination with an appropriate means, such as 
a label, for determining hybridization. A wide variety 
of appropriate indicator means are known in the art, 
including radioactive, enzymatic or other ligands, such 

25 as avidin/biotin, which are capable of providing a 
detectable signal. In some diagnostic embodiments, an 
enzyme tag such as urease, alkaline phosphatase or 
peroxidase, instead of a radioactive tag may be used* In 
the case of enzyme tags, colorimetric indicator 

30 substrates are known which can be employed to provide a 
means visible to the human eye or spectrophotometrically, 
to identify specific hybridization with samples 
containing gene sequences encoding high molecular weight 
proteins of non-typeable Haemophilus . 

35 The nucleic acid sequences of genes of the present 

invention are useful as hybridization probes in solution 
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hybridizations and in embodiments employing solid-phase 
procedures* In embodiments involving solid-phase 
procedures, the test DNA (or RNA) from samples, such as 
clinical samples, including exudates, body fluids (e. g., 
5 serum, amniotic fluid, middle ear effusion, sputum, 
bronchoalveolar lavage fluid) or even tissues, is 
adsorbed or otherwise affixed to a selected matrix or 
surface. The fixed, single-stranded nucleic acid is then 
subjected to specific hybridization with selected probes 

10 comprising the nucleic acid sequences of the genes or 
fragments thereof of the present invention under desired 
conditions. The selected conditions will depend on the 
particular circumstances based on the particular criteria 
required depending on, for example , the G+C contents, 

15 type of target nucleic acid, source of nucleic acid, size 
of hybridization probe etc- Following washing of the 
hybridization surface so as to remove non-specif ically 
bound probe molecules, specific hybridization is 
detected, or even quantified, by means of the label. As 

20 with the selection of peptides, it is preferred to select 
nucleic acid sequence portions which are conserved among 
species of non-typeable Haemophilus . The selected probe 
may be at least about 18 bp and may be in the range of 
about 30 bp to about 90 bp long. 

25 4. Expression of the High Molecular Weight Protein 
Genes 

Plasmid vectors containing replicon and control 
sequences which are derived from species compatible with 
the host cell may be used for the expression of the genes 

30 encoding high molecular weight proteins of non-typeable 
Haemophilus in expression systems. The vector ordinarily 
carries a replication site, as well as marking sequences 
which are capable of providing phenotypic selection in 
transformed cells. For example, E* coli may be 

35 transformed using pBR322 which contains genes for 
ampicillin and tetracycline resistance and thus provides 
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easy means for identifying transformed cells. The pBR322 
plasmid, or other microbial plasm id or phage must also 
contain, or be modified to contain , promoters which can 
be used by the host cell for expression of its own 
5 proteins. 

in addition , phage vectors containing replicon and 
control sequences that are compatible with the host can 
be used as a transforming vector in connection with these 
hosts. For example, the phage in lambda GEM™-ll may be 

10 utilized in making recombinant phage vectors which can be 
used to transform host cells, such as E. coli LE392. 

Promoters commonly used in recombinant DNA 
construction include the ^-lactamase (penicillinase) and 
lactose promoter systems (Chang et al., 1978: Itakura et 

15 al., 1977 Goeddel et al., 1979; Goeddel et al., 1980) and 
other microbial promoters such as the T7 promoter system 
(U.S. Patent 4,952,496). Details concerning the 
nucleotide sequences of promoters are known, enabling a 
skilled worker to ligate them functionally with genes. 

20 The particular promoter used will generally be a matter 
of choice depending upon the desired results. Hosts that 
are appropriate for expression of the genes encoding the 
high molecular weight proteins, fragment analogs or 
variants thereof, include E. coli f Bacillus species, 

25 Haemophilus , fungi, yeast or the baculo virus expression 
system may be used. 

In accordance with this invention, it is preferred 
to make the high molecular weight proteins by recombinant 
methods, particularly since the naturally occurring high 

30 molecular weight protein as purified from a culture of a 
species of non-typeable Haemophilus may include trace 
amounts of toxic materials or other contaminants. This 
problem can be avoided by using recombinantly produced 
proteins in heterologous systems which can be isolated 

35 from the host in a manner to minimize comtaminants in the 
purified material. Particularly desirable hosts for 
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expression in this regard include Gram positive bacteria 
which do not have LPS and are, therefore , endotoxin free. 
Such hosts include species of Bacillus and may be 
particularly useful for the production of non-pyrogenic 
5 high molecular weight protein , fragments or analogs 
thereof* Furthermore, recombinant methods of production 
permit the manufacture of HMWl, HMW2 , HMW3 or HMW4, and 
corresponding HMW proteins from other non-typeable 
Haemophilus influenzae strains, or fragments thereof, 
10 separate from one another and devoid of non-HMW protein 
of non-typeable ffaemopbj,J.us infUienzae. 

Biological Deposits 

Certain hybridomas producing monoclonal antibodies 

15 specific for high molecular weight protein of Haemophilus 
influenzae according to aspects of the present invention 
that are described and referred to herein have been 
deposited with the American Type Culture Collection 
(ATCC) located at 12301 Parklawn Drive, Rockville, 

20 Maryland, USA, 20852, pursuant to the Budapest Treaty and 
prior to the filing of this application. Samples of the 
deposited hybridomas will become available to the public 
upon grant of a patent based upon this United States 
patent application. The invention described and claimed 

25 herein is not to be limited in scope by the hybridomas 
deposited, since the deposited embodiment is intended 
only as an illustration of the invention* Any equivalent 
or similar hybridomas that produce similar or equivalent 
antibodies as described in this application are within 

30 the scope of the invention* 

Deposit summary 

Hybridomas yrdc Designation pate peposjteq 

AD6 
35 10C5 
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EXAMPLES 

The above disclosure generally describes the present 
invention. A more complete understanding can be obtained 
by reference to the following specific Examples. These 
5 Examples are described solely for purposes of 
illustration and are not intended to limit the scope of 
the invention. changes in form and substitution of 
equivalents are contemplated as circumstances may suggest 
or render expedient. Although specific terms have been 

10 employed herein, such terms are intended in a descriptive 
sense and not for purposes of limitations. 

Methods of molecular genetics, protein biochemistry, 
and immunology used but not explicitly described in this 
disclosure and these Examples are amply reported in the 

15 scientific literature and are well within the ability of 
those skilled in the art. 
Example }: 

This Example describes the isolation of DNA encoding 
HMWl and HMW2 proteins, cloning and expression of such 

20 proteins, and sequencing and sequence analysis of the DNA 
molecules encoding the HMWl and HMW2 proteins. 

Non-typeable fl. %nf jtueggftg strains 5 and 12 were 
isolated in pure culture from the middle ear fluid of 
children with acute otitis media. Chromosomal DNA from 

25 strain 12, providing genes encoding proteins HMWl and 
HMW2, was prepared by preparing Sau3A partial restriction 
digests of chromosomal DNA and fractionating on sucrose 
gradients- Fractions containing DNA fragments in the 9 
to 20 kbp range were pooled and a library was prepared by 

30 ligation into XEMBL3 arms. Ligation mixtures were 
packaged in vitro and plate-amplified in a P2 lysogen of 
P- ooll LE392. 

For plasmid subcloning studies, DNA from a 
representative recombinant phage was subcloned into the 

35 T7 expression plasmid pT7-7, containing the T7 RNA 
polymerase promoter 4>10, a ribosome-binding site and the 
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translational start site for the T7 gene 10 protein 
upstream from a multiple cloning site (see Figure 5B) . 

DNA sequence analysis was performed by the dideoxy 
method and both strands of the HHW1 gene and a single 
5 strand of the HMW2 gene were sequenced. 

Western immunoblot analysis was performed to 
identify the recombinant proteins being produced by 
reactive phage clones (Figure 11) . Phage lysates grown 
in LE392 cells or plaques picked directly from a lawn of 
10 LE392 cells on YT plates were solubiiized in gel 
electrophoresis sample buffer prior to electrophoresis. 
Sodium dodecyl sulfate polyacrylamide gel electrophoresis 
(SDS-PAGE) was performed on 7.5% or 11% polyacrylamide 
modified Laemmli gels. After transfer of the proteins to 
15 nitrocellulose sheets, the sheets were probed 
sequentially with an E. coli -absorbed human serum sample 
containing high-titer antibody to the high-molecular- 
weight proteins and then with alkaline phosphatase- 
conjugated goat anti-human immunoglobulin G (IgG) second 
20 antibody. Sera from healthy adults contains high-titer 
antibody directed against surface-exposed high-molecular- 
weight proteins of non-typeable H. influenzae. One such 
serum sample was used as the screening antiserum after 
having been extensively absorbed with LE392 cells. 
25 To identify recombinant proteins being produced by 

K . con transformed with recombinant plasmids, the 
plasmids of interest were used to transform E. coU BL21 
(DE3) /pLysS. The transformed strains were grown to an 
of 0.5 in L broth containing 50 nq of ampicillin per 
30 ml. IPTG was then added to l mM. One hour later, cells 
were harvested, and a sonicate of the cells was prepared. 
The protein concentrations of the samples were determined 
by the bicinchoninic acid method. Cell sonicates 
containing 100 ng of total protein were solubiiized in 
35 electrophoresis sample buffer, subjected to SDS- 
polyacrylamide gel electrophoresis, and transferred to 
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nitrocellulose. The nitrocellulose was then probed 
sequentially with the E. coli -absorbed adult serum sample 
and then with alkaline phosphatase-conjugated goat anti- 
human IgG second antibody. 
5 Western immunoblot analysis also was performed to 

determine whether homologous and heterologous non- 
typeable influenzae strains expressed high-molecular- 
weight proteins antigenically related to the protein 
encoded by the cloned HMW1 gene (rHMWl) • Cell sonicates 

10 of bacterial cells were solubilized in electrophoresis 
sample buffer, subjected to SDS-polyacrylamide gel 
electrophoresis , and transferred to nitrocellulose. 
Nitrocellulose was probed sequentially with polyclonal 
rabbit rHMWl antiserum and then with alkaline 

15 phosphatase-conjugated goat anti-rabbit IgG second 
antibody. 

Finally, Western immunoblot analysis was performed 
to determine whether non-typeable Haemophilus strains 
expressed proteins antigenically related to the 

20 filamentous hemagglutinin protein of Bordetella 
pertussis . Monoclonal antibody X3C, a murine 

immunoglobulin G (IgG) antibody which recognizes 
filamentous hemagglutinin, was . used to probe cell 
sonicates by Western blot. An alkaline phosphatase- 

25 conjugated goat anti-mouse IgG second antibody was used 
for detection. 

To generate recombinant protein antiserum, E. coli 
BL21(DE3) /pLysS was transformed with pHMWl-4, and 
expression of recombinant protein was induced with IPTG, 

30 as described above. A cell sonicate of the bacterial 
cells was prepared and separated into a supernatant and 
pellet fraction by centrifugation at 10,000 x g for 30 
min. The recombinant protein fractionated with the 
pellet fraction. A rabbit was subcutaneously immunized 

35 on biweekly schedule with 1 mg of protein from the pellet 
fraction, the first dose given with Freund's complete 
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adjuvant and subsequent doses with Freund's incomplete 
adjuvant. Following the fourth injection, the rabbit was 
bled. Prior to use in the Western blot assay, the 
antiserum was absorbed extensively with sonicates of the 
5 host E. coli strain transformed with cloning vector 
alone. 

To assess the sharing of antigenic determinants 
between HMW1 and filamentous hemagglutinin, enzyme-linked 
immunosorbent assay (ELISA) plates (Costar, Cambridge, 

10 Mass.) were coated with 60 Ml of a 4-/tg/ml solution of 
filamentous hemagglutinin in Dulbecco's phosphate- 
buffered saline per well for 2 h at room temperature* 
Wells were blocked for 1 h with 1% bovine serum albumin 
in Dulbecco's phosphate-buf fered saline prior to addition 

15 of serum dilutions. rHMWl antiserum was serially diluted 
in 0.1% Brij (Sigma, St. Louis, Mo.) in Dulbecco's 
phosphate-buffered saline and incubated for 3 h at room 
temperature. After being washed, the plates were 
incubated with peroxidase-conjugated goat anti-rabbit lgG 

20 antibody (Bio-Rad) for 2 h at room temperature and subse- 
quently developed with 2 , 2 ' -azino-bis ( 3- 
ethylbenzthiazoline-6-sulfonic acid) (Sigma) at a 
concentration of 0.54 in mg/ml in 0.1 M sodium citrate 
buffer, pH 4.2, containing 0.03% H^. Absorbances were 

25 read on an automated ELISA reader. 

Recombinant phage expressing HMW1 or HMW2 were 
recovered as follows. The non-typeable ». jpf 3.uengfle 
strain 12 genomic library was screened for clones 
expressing high-molecular-weight proteins with an £^ 

30 coli- absorbed human serum sample containing a high titer 
of antibodies directed against the high-molecular-weight 
proteins. 

Numerous strongly reactive clones were identified 
along with more weakly reactive ones. Twenty strongly 
35 reactive clones were plaque-purified and examined by 
Western blot for expression of recombinant proteins. 
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Each of the strongly reactive clones expressed one of two 
types of high-molecular-weight proteins, designated HMWl 
and HMW2. The major imiaunoreactive protein bands in the 
HMWl and HMW2 lysates migrated with apparent molecular 
5 masses of 125 and 120 kDa, respectively. In addition to 
the major bands, each lysate contained minor protein 
bands of higher apparent molecular weight. Protein bands 
seen in the HMW2 lysates at molecular masses of less than 
120 kDa were not regularly observed and presumably 

10 represent proteolytic degradation products. Lysates of 
LE392 infected with the XEMBL3 cloning vector alone were 
non-reactive when immunologically screened with the same 
serum sample- Thus, the observed activity was not due to 
cross-reactive E. coli proteins or XEMBL3 -encoded pro- 

15 teins. Furthermore, the recombinant proteins were not 
simply binding immunoglobulin nonspecif ically, since the 
proteins were not reactive with the goat anti-human IgG 
conjugate alone/ with normal rabbit sera, or with serum 
from a number of healthy young infants* 

20 Representative clones expressing either the HMWl or 

HMW2 recombinant proteins were characterized further. 
The restriction maps of the two phage types were 
different from each other, including the regions encoding 
the HMWl and HMW2 structural genes. Figure 5A shows 

25 restriction maps of representative recombinant phage 
which contained the HMWl or HMW2 structural genes. The 
locations of the structural genes are indicated by the 
shaded bars. 

HMWl plasmid subclones were constructed by using the 
30 T7 expression plasmid T7-7 (Fig. 5A and B) . HMW2 plasmid 
subclones also were constructed, and the results with 
these latter subclones were similar to those observed 
with the HMWl constructs. 

The approximate location and direction of 
35 transcription of the HMWl structure gene were initially 
determined by using plasmid pHMWl (Fig. 5A) . This 
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10 



15 



plasmid was constructed by inserting the 8.5-kb EisHI- 
Sal l fragment from XHMWl into BamHI- and Sal I -cut pT7-7. 
r. eoli transformed with pHMWl expressed an 
immunoreactive recombinant protein with an apparent 
molecular mass of 115 kDa, which was strongly inducible 
with IPTG. This protein was significantly smaller than 
the 125-kDa major protein expressed by the parent phage, 
indicating that it either was being expressed as a fusion 
protein or was truncated at the carboxy terminus. 

To more precisely localize the 3' end of the 
structural gene, additional plasmids were constructed 
with progressive deletions from the 3' end of the pHMWl 
construct. Plasmid pHHWl-1 was constructed by digestion 
of pHMWl with PstI, isolation of the resulting 8.8-kb 
fragment, and religation. Plasmid pHMWl-2 was 

constructed by digestion of pHMWl with Hindlll, isolation 
of the resulting 7.5-kb fragment, and religation. Ej. 
coli transformed with either plasmid pHMWl-1 or pHMWl-2 
also expressed an immunoreactive recombinant protein with 
an apparent molecular mass of 115 kDa. These results 
indicated that the 3' end of the structural gene was 5' 
of the Hindlll site. Figure 12 demonstrates the Western 
blot results with pHMWl-2 transformed cells before and 
after IPTG indicates (lanes 3 and 4, respectively). The 
25 115 kDa recombinant protein is indicated by the arrow. 
Transformants also demonstrated cross-reactive bands of 
lower apparent molecular weight, and probably represent 
partial degradation products. Shown for comparison and 
the results for e. coll transformed with the pT7-7 
30 cloning vector alone (Fig. 12, lanes 1 and 2). 

To more precisely localize the 5' end of the gene, 
plasmids pHMWl-4 and pHMWl-7 were constructed. Plasmid 
pHMWl-4 was constructed by cloning the 5.1-kb fiamHI- 
Hindlll fragment from XHMWl into a pT7-7-derived plasmid 
35 containing the upstream 3.8-kb SssRI-BSlHi fragment. 

coli transformed with pHMWl-4 expressed an immunoreactive 
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protein with an apparent molecular mass of approximately 
160 kDa (Fig. 12, lane 6). Although protein production 
was inducible with IPTG, the levels of protein production 
in these transf ormants were substantially lower than 
5 those with the pHMWl-2 transf ormants described above. 
Plasmid pHMWl-7 was constructed by digesting pHMWl-4 with 
Nde l and spe l. The 9.0-kbp fragment generated by this 
double digestion was isolated , blunt ended, and 
religated. E» coli transformed with pHMWl-7 also 

10 expressed an immunoreactive protein with an apparent 
molecular mass of 160 kDa, a protein identical in size to 
that expressed by the pHMWl-4 transf ormants. The result 
indicated that the initiation codon for the HMWl 
structural gene was 3' of the Spe l site. DNA sequence 

15 analysis (described below) confirmed this conclusion. 

As noted above, the XHMWl phage clones expressed a 
major immunoreactive band of 125 kDa, whereas the HMWl 
plasmid clones pHMWl-4 and pHMWl-7, which contained what 
was believed to be the full-length gene, expressed an 

20 immunoreactive protein of approximately 160 kDa* This 
size discrepancy was disconcerting. One possible 
explanation was that an additional gene or genes 
necessary for correct processing of the HMWl gene product 
were deleted in the process of subcloning. To address 

25 this possibility, plasmid pHMWl-14 was constructed. This 
construct was generated by digesting pHMWl with fidel and 
fllu l and inserting the 7 . 6-kbp Ndel -Mlu l fragment 
isolated from pHMWl-4. Such a construct would contain 
the full-length HMWl gene as well as the DNA 3' of the 

30 HMWl gene which was present in the original HMWl phage. 
E. coli transformed with this plasmid expressed major 
immunoreactive proteins with apparent molecular masses of 
125 and 160 kDa as well as additional degradation 
products (Fig. 12, lanes 7 and 8). The 125- and 160-kDa 

35 bands were identical to the major and minor 
immunoreactive bands detected in the HMWl phage lysates. 
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Interestingly, the pHMWl-14 construct also expressed 
significant amounts of protein in the uninduced 
condition, a situation not observed with the earlier 
constructs. 

5 The relationship between the 125- and 160-kDa 

proteins remains somewhat unclear. Sequence analysis, 
described below, reveals that the HMW1 gene would be 
predicted to encode a protein of 159 kDa. It is believed 
that the 160-kDa protein is a precursor form of the 

10 mature 125-kDa protein, with the conversion from one 
protein to the other being dependent on the products of 
the two downstream genes. 

sequence analysis of the HMWl gene (Figure l) 
revealed a 4,608-bp open reading frame (ORF) , beginning 

15 with an ATG codon at nucleotide 351 and ending with a TAG 
stop codon at nucleotide 4959. A putative ribosome- 
binding site with the sequence AGGAG begins 10 bp up- 
stream of the putative initiation codon. Five other in- 
frame ATG codons are located within 250 bp of the 

20 beginning of the ORF, but none of these is preceded by a 
typical ribosome-binding site. The 5 '-flanking region of 
the ORF contains a series of direct tandem repeats, with 
the 7-bp sequence ATCTTTC repeated 16 times. These 
tandem repeats stop 100 bp 5' of the putative initiation 

25 codon. An 8-bp inverted repeat characteristic of a rho- 
independent transcriptional terminator is present, 
beginning at nucleotide 4983, 25 bp 3' of the presumed 
translational stop. Multiple termination codons are 
present in all three reading frames both upstream and 

30 downstream of the ORF. The derived amino acid sequence 
of the protein encoded by the HMWl gene (Figure 2) has a 
molecular weight of 159,000, in good agreement with the 
apparent molecular weights of the proteins expressed by 
the HMW1-4 and HMW1-7 transformants. The derived amino 

35 acid sequence of the amino terminus does not demonstrate 
the characteristics of a typical signal sequence. The 
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BamH I site used in generation of pHHWl comprises bp 1743 
through 1748 of the nucleotide sequence. The orf 
downstream of the Bam HI site would be predicted to encode 
a protein of 111 JcDa, in good agreement with the 115 kDa 
5 estimated for the apparent molecular mass of the pHMWl- 
encoded fusion protein. 

The sequence of the HMW2 gene (Figure 3) consists of 
a 4,431-bp ORF, beginning with an ATG codon at nucleotide 
352 and ending with a TAG stop codon at nucleotide 4783. 
10 The first 1,259 bp of the ORF of the HMW2 gene are 
identical to those of the HMWl gene. Thereafter, the 
sequences begin to diverge but are 80% identical overall. 
With the exception of a single base addition at 
nucleotide 93 of the HMW2 sequence, the 5 '-flanking 
15 regions of the HMWl and HMW2 genes are identical for 310 
bp upstream from the respective initiation codons. Thus, 
the HMW2 gene is preceded by the same set of tandem 
repeats and the same putative ribosome-binding site which 
lies 5' of the HMWl gene. A putative transcriptional 
20 terminator identical to that identified 3' of the HMWl 
ORF is noted, beginning at nucleotide 4804. The 
discrepancy in the lengths of the two genes is 
principally accounted for by a 186-bp gap in the HMW2 
sequence, beginning at nucleotide position 3839. The 
25 derived amino acid sequence of the protein encoded by the 
HMW2 gene (Figure 4) has a molecular weight of 155,000 
and is 71% identical with the derived amino acid sequence 
of the HMWl gene. 

The derived amino acid sequences of both the HMWl 
30 and HMW2 genes (Figures 2 and 4) demonstrated sequence 
similarity with the derived amino acid sequence of 
filamentous hemagglutinin of Bordetella pertussin a 
surface-associated protein of this organism. The initial 
and optimized TFASTA scores for the HMWl-f ilamentous 
35 hemagglutinin sequence comparison were 87 and 186, 
respectively, with a word size of 2. The z score for the 
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comparison was 45-8. The initial and optimized TFASTA 
scores for the HMW2-filamentous hemagglutinin sequence 
comparison were 68 and 196, respectively. The z score 
for the latter comparison was 48.7. The magnitudes of 
5 the initial and optimized TFASTA scores and the z scores 
suggested that a biologically significant relationship 
existed between the HMW1 and HMW2 gene products and 
filamentous hemagglutinin. When the derived amino acid 
sequences of HMW1, HMW2 , and filamentous hemagglutinin 

10 genes were aligned and compared, the similarities were 
most notable at the amino-terminal ends of the three 
sequences. Twelve of the first 22 amino acids in the 
predicted peptide sequences were identical. In addition, 
the sequences demonstrated a common f ive-amino-acid 

15 stretch, Asn-Pro-Asn-Gly-Ile, and several shorter 
stretches of sequence identity within the first 200 amino 

acids . 
Example 2 : 

This Example describes the relationship of 
20 filamentous hemagglutinin and the HHWi protein. 

To further explore the HMWl-f ilamentous 
hemagglutinin relationship, the ability of antiserum 
prepared against the HKW1-4 recombinant protein (rHMWl) 
to recognize purified filamentous hemagglutinin was 
25 assessed (Figure 13). The rHMWl antiserum demonstrated 
ELISA reactivity with filamentous hemagglutinin in a 
dose-dependent manner. Preimmune rabbit serum had 
minimal reactivity in this assay. The rHMWl antiserum 
also was examined in a Western blot assay and 
30 demonstrated weak but positive reactivity with purified 
filamentous hemagglutinin in this system also. 

To identify the native Haemophilus protein 
corresponding to the HHWI gene product and to determine 
the extent to which proteins antigenically related to the 
35 HMW1 cloned gene product were common among other non- 
typeable ff, influenzae strains, a panel of Haem . ophU*s 
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strains was screened by Western blot with the rHMWl 
antiserum. The antiserum recognized both a 125- and a 
120-kDa protein band in the homologous strain 12 (Figure 
14) , the putative mature protein products of the HMW1 and 
5 HMW2 genes, respectively. The 120-kDa protein appears as 
a single band in Figure 14, wherein it appeared as a 
doublet in the HMW2 phage lysates (Figure 11) * 

When used to screen heterologous non-typeable £L. 
i nfluenzae strains, rHMWl antiserum recognized high- 

10 molecular-weight proteins in 75% of 125 epidemiological^ 
unrelated strains. In general, the antiserum reacted 
with one or two protein bands in the 100- to 150-kDa 
range in each of the heterologous strains in a pattern 
similar but not identical to that seen in the homologous 

15 strain (Figure 14) . 

Monoclonal antibody X3C is a murine igG antibody 
directed against the filamentous hemagglutinin protein of 
B. pertussis . This antibody can inhibit the binding of 
B. pertussis cells to Chinese hamster ovary cells and 

20 HeLa cells in culture and will inhibit hemagglutination 
of erythrocytes by purified filamentous hemagglutinin* 
A Western blot assay was performed in which this 
monoclonal antibody was screened against the same panel 
of non-typeable H. influenzae strains discussed above 

25 (Figure 14) . Monoclonal antibody X3C recognized both the 
high-molecular-weight proteins in non-typeable JL. 
influenzae strain 12 which were recognized by the 
recombinant-protein antiserum (Figure 15) . In addition, 
the monoclonal antibody recognized protein bands in a 

30 subset of heterologous non-typeable H. influenzae strains 
which were identical to those recognized by the 
recombinant-protein antiserum, as may be seen by 
comparison of Figures 14 and 15. On occasion, the 
filamentous hemagglutinin monoclonal antibody appeared to 

35 recognize only one of the two bands which had been 
recognized by the recombinant-protein antiserum (compare 
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strain lane 18 in Figures 14 and 15, for example) . 
Overall, monoclonal antibody X3C recognized high- 
molecular-weight protein bands identical to those 
recognized by the rHMWl antiserum in approximately 35% of 
5 our collection of non-typeable H. influenzae strains. 
Example 3 : 

This Example describes the adhesin properties of the 
HMWl and HMW2 proteins. 

Mutants deficient in expression of HMWl, HMW2 or 

10 both proteins were constructed to examine the role of 
these proteins in bacterial adherence. The following 
strategy was employed. pHHWl-14 (see Example 1, Figure 
5A) was digested with BaroH I and then ligated to a 
kanamycin cassette isolated on a 1.3-kb BamHl fragment 
15 from pUC4K. The resultant plasmid (pHMWl-17) was 
linearized by digestion with Xbal and transformed into 
non-typeable ffi- influenzae strain 12, followed by 
selection for kanamycin resistant colonies. Southern 
analysis of a series of these colonies demonstrated two 
20 populations of transf ormants , one with an insertion in 
the HMWl structural gene and the other with an insertion 
in the HMW2 . structural gene. One mutant from each of 
these classes was selected for further studies. 

Mutants deficient in expression of both proteins 
25 were recovered using the following protocol. After 
deletion of the 2.1-kb fragment of DNA between two EsoRI 
sites spanning the 3 '-portion of the HMWl structural gene 
and the 5 '-portion of a downstream gene encoding an 
accessory processing protein in pHMW-15, the kanamycin 
cassette from pUC4K was inserted as a 1.3-kb EcoRl 
fragment. The resulting plasmid (pHMWl-16) was 
linearized by digestion with Sjal and transformed into 
strain 12, followed again by selection for kanamycin 
resistant colonies. Southern analysis of a 

35 representative sampling of these colonies demonstrated 
that in seven of eight cases, insertion into both the 
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HMWl and HMW2 loci had occurred. One such mutant was 
selected for further studies. 

To confirm the intended phenotypes, the mutant 
strains were examined by western blot analysis with a 
5 polyclonal antiserum against recombinant HMW1 protein. 
The parental strain expressed both the 125-JcD HMWl and 
the 120-kD HMW2 protein (Figure 16) . In contrast , the 
HMW2" mutant failed to express the 120-kD protein, and the 
HMWl mutant failed to express the 125-kD protein. The 

10 double mutant lacked expression of either protein- On 
the basis of whole cell lysates, outer membrane profiles, 
and colony morphology, the wild type strain and the 
mutants were otherwise identical with one another. 
Transmission electron microscopy demonstrated that none 

15 of the four strains expressed pili. 

The capacity of wild type strain 12 to adhere to 
Chang epithelial cells was examined. In such assays, 
bacteria were inoculated into broth and allowed to grow 
to a density of -2 x 10 9 cfu/ml. Approximately 2 x 10 7 

20 cfu were inoculated onto epithelial cell monolayers, and 
plates were gently centrifuged at 165 x g for 5 minutes 
to facilitate contact between bacteria and the epithelial 
surface. After incubation for 30 minutes at 37 °C in 5% 
CO2, monolayers were rinsed 5 times with PBS to remove 

25 nonadherent organisms and were treated with trypsin-EDTA 
(0.05% trypsin, 0.5% EDTA) in PBS to release them from 
the plastic support. Well contents were agitated, and 
dilutions were plated on solid medium to yield the number 
of adherent bacteria per monolayer. Percent adherence 

30 was calculated by dividing the number of adherent cfu per 
monolayer by the number of inoculated cfu. 

As depicted in Table 1 below (the Tables appear at 
the end of the descriptive text) , this strain adhered 
quite efficiently, with nearly 90% of the inoculum 

35 binding to the monolayer. Adherence by the mutant 
expressing HMWl but not HMW2 (HMW2*) was also quite 
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efficient and comparable to that by the wild type strain. 
In contrast , attachment by the strain expressing HMW2 but 
deficient in expression of HMWl (HMWr) was decreased 
about 15-fold relative to the wild type- Adherence by 
5 the double mutant (HMW1~/HMW2") was decreased even 
further, approximately 50-fold compared with the wild 
type and approximately 3 -fold compared with the HMWl 
mutant* Considered together, these results suggest that 
both the HMWl protein and the, HMW2 protein influence 
10 attachment to Chang epithelial cells. Interestingly, 
optimal adherence to this cell line appears to require 
HMWl but not HMW2. 
Example 4 ; 

This Example illustrates the preparation and 
15 expression of HMW3 and HMW4 proteins and their function 
as adhesins. 

Using the plasmids pHMWl-16 and pHMWl-17 (see 
Example 3) and following a scheme similar to that 
employed with strain 12 as described in Example 3, three 

20 non-typeable Haemophilus strain 5 mutants were isolated, 
including one with the kanamycin gene inserted into the 
hmwl- like (designated hmw3 1 locus, a second with an 
insertion in the fcawa-like (designated ftipw4) locus , and 
a third with insertions in both loci. As predicted, 

25 Western immunoblot analysis demonstrated that the mutant 
with insertion of the kanamycin cassette into the hmwl- 
like locus had lost expression of the HMW3 125-kD 
protein, while the mutant with insertion into the hmw2- 
like locus failed to express the HMW4 123-kD protein* 

30 The mutant with a double insertion was unable to express 
either of the high molecular weight proteins. 

As shown in Table 1 below, wild type strain 5 
demonstrated high level adherence, with almost 80% of the 
inoculum adhering per monolayer- Adherence by the mutant 

35 deficient in expression of the HMW2-like protein (i.e. 
HMW4 protein) was also quite high. In contrast, 
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adherence by the mutant unable to express the HMWi-like 
protein (i.e. HMW3 protein) was reduced about 5-fold 
relative to the wild type, and attachment by the double 
mutant was diminished even further (approximately 25- 
5 fold) . Examination of Giemsa-stained samples confirmed 
these observations (not shown) . Thus, the results with 
strain 5 for proteins HMW3 and HMW4 corroborate the 
findings with strain 12 and the HMW1 and HMW2 proteins. 
Example 5; 

10 This Example contains additional data concerning the 

adhesin properties of the HMW1 and HMW2 proteins. 

To confirm an adherence function for the HMW1 and 
HMW2 proteins and to examine the effect of HMW1 and HMW2 
independently of other H. influenzae surface structures, 

15 the hmwl and the hmw2 gene clusters were introduced into 
E. coli DH5a, using plasmids pHMWl-14 and pHMW2-21, 
respectively. As a control, the cloning vector, pT7-7, 
was also transformed into E. coli DH5a. Western blot 
analysis demonstrated that E. coli DH5a containing the 

20 hmwl genes expressed a 125 kDa protein, while the same 
strain harboring the hmw2 genes expressed a 120 -kDa 
protein. E. coli DH5a containing pT7-7 failed to react 
with antiserum against recombinant HMWl. Transmission 
electron microscopy revealed no pili or other surface 

25 appendages on any of the E. coli strains* 

Adherence by the E. coli strains was quantitated and 
compared with adherence by wild type non-typeable H» 
influenzae strain 12. As shown in Table 2 below, 
adherence by E. oo\± DH5a containing vector alone was 

30 less than 1% of that for strain 12. In contrast, E. coli 
DH5cr harboring the hmwl gene cluster demonstrated 
adherence levels comparable to those for strain 12. 
Adherence by E, coli DH5a containing the hmw2 genes was 
approximately 6-fold lower than attachment by strain 12 

35 but was increased 20-fold over adherence by E. coli DH5a 
with pT7-7 alone. These results indicate that the HMWl 
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-and HMW2 proteins are capable of independently Mediating 
attachment to Chang conjunctival cells. These results 
are consistent with the results with the H. influent 
mutants reported in Examples 3 and 4, providing further 
5 evidence that, with Chang epithelial cells, HMWl is a 
more efficient adhesin than is HMW2. 

Experiments with E. coli HBioi harboring pT7-7, 
pHMWl-14, or pHMW2-21 confirmed the results obtained with 
the 0H5a derivatives (see Table 2) . 
10 Example ji 

This Example illustrates the copurif ication of HMWl 
and HMW2 proteins from wild-type non-typeable 
Influenzae strain. 

HMWl and HMW2 were isolated and purified from non- 
15 typeable H. influenzae (MTHI) strain 12 in the following 
manner. Non-typeable Haemophilus bacteria from frozen 
stock culture were streaked onto a chocolate plate and 
grown overnight at 37«C in an incubator with 5% COj. 
50ml starter culture of brain heart infusion (BHI) broth, 
20 supplemented with 10 tig /ml each of hemin and NAD was 
inoculated with growth on chocolate plate. The starter 
culture was grown until the optical density (O.D. - 
600nm) reached 0.6 to 0.8 and then the bacteria in the 
starter culture was used to inoculate six 500 ml flasks 
25 of supplemented BHI using 8 to 10 ml per flask. The 
bacteria were grown in 500 ml flasks for an additional 5 
to 6 hours at which tine the O.D. was 1.5 or greater. 
Cultures were centrifuged at 10,000 rpm for 10 minutes. 
Bacterial pellets were resuspended in a total volume 
30 of 250 ml of an extraction solution comprising 0.5 M 
NaCl, 0.01 M Na 2 EDTA, 0.01 M Tris 50 /lM 1,10- 
phenanthroline, pH 7.5. The cells were not sonicated or 
otherwise disrupted. The resuspended cells were allowed 
to sit on ice at o°C for 60 minutes. The resuspended 
35 cells were centrifuged at 10,000 rpm for 10 minutes at 
4°C to remove the majority of intact cells and cellular 
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debris. The supernatant was collected and centrifuged at 
100,000 x g for 60 minutes at 4°C. The supernatant again 
was collected and dialyzed overnight at 4*C against 0.01 
M sodium phosphate, pH 6.0. 

The sample was centrifuged at 10,000 rpm for 10 
minutes at 4«c to remove insoluble debris precipitated 
from solution during dialysis. The supernatant was 
applied to a 10 ml CM Sepharose column which has been 
pre-equilibrated with 0.01 M sodium phosphate, pH 6. 
Following application to this column/ the column was 
washed with 0.01 M sodium phosphate. Proteins were 
elevated from the column with a 0 - 0.5M KC1 gradient in 
0.01 M Na phosphate, pH 6 and fractions were collected 
for gel examination. Coomassie gels of column fractions 
were carried out to identify those fractions containing 
high molecular weight proteins. The fractions containing 
high molecular weight proteins were pooled and 
concentrated to a l to 3 ml volume in preparation for 
application of sample to gel filtration column. 

A Sepharose CL-4B gel filtration column was 
equilibrated with phosphate-buffered saline, pH 7.5. The 
concentrated high molecular weight protein sample was 
applied to the gel filtration column and column fractions 
were collected. Coomassie gels were performed on the 
column fractions to identify those containing high 
molecular weight proteins. The column fractions 
containing high molecular weight proteins were pooled. 
Example 7 ; 

This Example illustrates the use of specified HMW1 
and HMW2 proteins in immunization studies. 

The copurified HMWl and HM»2 proteins prepared as 
described in Example 6 were tested to determine whether 
they would protect against experimental otitis media 
caused by the homologous strain. 

Healthy adult chinchillas, l to 2 years of age with 
weights of 350 to 500g, received three monthly 
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subcutaneous injections with 40 fig of an HMW1-HMW2 
protein mixture in Freund's adjuvant* Control animals 
received phosphate-buffered saline in Freunds' adjuvant. 
One month after the last injection, the animals were 
5 challenged by intrabullar inoculation with 300 cfu of 
NTHI strain 12. 

Middle ear infection developed in 5 of 5 control 
animals versus 5 of 10 immunized animals. Although only 
5 of 10 chinchillas were protected in this test, the test 

10 "conditions are very stringent, requiring bacteria to be 
injected directly into the middle ear space and to 
proliferate in what is in essence a small abscess cavity. 
As seen from the additional data below, complete 
protection of chinchillas can be achieved. 

15 The five HMW1/HMW2 -immunized animals that did not 

develop otitis media demonstrated no signs of middle ear 
inflammation when examined by otoscopy nor were middle 
ear effusions detectable* 

Among the five HMWl/HMW2-immunized animals that 

20 became infected, the total duration of middle ear 
infection as assessed by the persistence of culture- 
positive middle ear fluid was not different from 
controls. However, the degree of inflammation of the 
tympanic membranes was subjectively less than in the 

25 HMWl/HMW2-immunized animals. When quantitative bacterial 
counts were performed on the middle ear fluid specimens 
recovered from infected animals, notable differences were 
apparent between the HMW1 / HMW2 - immuni zed and PBS- 
immunized animals (Figure 17) . Shown in Figure 17 are 

30 quantitative middle ear fluid bacterial counts from 
animals on day 7 post-challenge, a time point associated 
with the maximum colony counts in middle ear fluid. The 
data were log-transformed for purpose of statistical 
comparison. The data from the control animals are shown 

35 on the left and data from the high molecular weight 
protein immunized animals on the right. The two 
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15 



horizontal lines indicate the respective means and 
standard derivations of middle ear fluid colony counts 
for only the infected animals in each group. As can be 
seen from this Figure, the HMWl/HMW2-immunized animals 
5 had significantly lower middle ear fluid bacterial counts 
than the PBS- immunized controls, geometric means of 7.4 
X 10* and 1.3 X 10 5 , respectively (p=o.02, Students' t- 
test) 

_ Serum antibody titres /flowing immunization were 
10 comparable in uninfected and infected animals. However, 
infection in immunized animals was uniformly associated 
with the appearance of bacteria down-regulated in 
expression of the HMW proteins, suggesting bacterial 
selection in response to immunologic pressure. 

Although this data shows that protection following 
immunization was not complete, this data suggests the HMW 
adhesin proteins are potentially important protective 
antigens which may comprise one component of a multi- 
component NTHI vaccine. 

In addition, complete protection has been achieved 
in the chinchilla model at lower dosage challenge, as set 
forth in Table 3 below. 

Groups of five animals were immunized with 20 fig of 
the HMW1-HMW2 mixture prepared as described in Example 6 
25 on days i, 28 and 42 in the presence of alum. Blood 
samples were collected on day 53 to monitor the antibody 
response. on day 56, the left ear of animals was 
challenged with about 10 cfu of H. influent strain 12. 
Ear infection was monitored on day 4. Four animals in 
Group 3 were infected previously by H. ixxiuan^ * strain 
12 and were recovered completely for at least one month 
before the second challenge. 
Example 8: 

This Example illustrates the provision of synthetic 
35 peptides corresponding to a portion only of the hmwi 
protein. 



20 



30 
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A number of synthetic peptides were derived from 
HMW1. Antisera then were raised to these peptides. The 
anti-peptide antisera to peptide HMW1-P5 was shown to 
recognize HMW1. Peptide HMW1-P5 covers amino acids 1453 
5 to 1481 of HMWl, has the sequence 
VDEVIEAKRILEKVKDLSDEEREALAKLG (SEQ ID No: 11), and 
represents bases 1498 to 1576 in Figure 10. 

This finding demonstrates that the DNA sequence and 
the derived protein is being interpreted in the correct- 
10 reading frame and that peptides derived from the sequence 
can be produced which will be immunogenic. 
fixqfrPle 9: 

This Example describes the generation of monoclonal 
antibodies to the high molecular weight proteins of non- 
15 typeable H. influenzae. 

Monoclonal antibodies were generated using 
standard techniques. In brief, female BALB/c mice (4 to 
6 weeks old) were immunized by intraperitoneal injection 
with high molecular weight proteins purified from 
20 nontypable jfrgpophjUus strain 5 or strain 12, as 
described in Example 6. The first injection of 40 to 50 
Mg of protein was administered with Freund's complete 
adjuvant and the second dose, received four to five weeks 
after the first, was administered with phosphate-buffered 
25 saline. Three days following the second injection, the 
mice were sacrificed and splenic lymphocytes were fused 
with SP2/0-Agl4 plasmacytoma cells. 

Two weeks following fusion, hybridoma supernatants 
were screened for the presence of high molecular weight 
30 protein specific antibodies by a dot-blot assay. 
Purified high molecular weight proteins at a 
concentration of 10 fig per ml in TRIS-buffered saline 
(TBS) , were used to sensitize nitrocellulose sheets (Bio- 
Rad Laboratories, Richmond, CA) by soaking for 20 
35 minutes. Following a blocking step with TBS-3% gelatin, 
the nitrocellulose was incubated for 60 minutes at room 
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temperature with individual hybridoma supernatant s, at a 
1:5 dilution in TBS-O. 1 % Tween, using a 96-well Bio-Dot 
micro-filtration apparatus (Bio-Rad) . After washing, the 
sheets were incubated for one hour with alkaline- 
5 phosphatase-conjugated affinity isolated goat-anti (mouse 
IgG + IgM) antibodies (Tago, Inc. , Burlingame, CA) . 
Following additional washes , positive supernatants were 
identified by incubation of the nitrocellulose sheet in 
alkaline phosphatase buffer (0.10 M TRIS, 0.10 M NaCl, 
10 0.005 M MgCl 2/ ) containing nitroblue tetrazblium (0*1 
mg/ml) and 5-bromo-4-chloro-3-indoyl phosphate (BCIP) 
(0.05 mg/ml) . 

For the antibody isotyping and immunoelectron 
microscopy studies to be described below, the monoclonal 

15 antibodies were purified from hybridoma supernatants. 
The antibodies recovered in this work were all of the IgG 
class. To purify the monoclonal antibodies, the 
hybridoma supernatants were first subjected to ammonium 
sulfate precipitation (50% final concentration at 0°C) . 

20 Following overnight incubation, the precipitate was 
recovered by centrifugation and resolubilized in 
phosphate buffered saline. The solution was then 
dialyzed overnight against 0.01 M sodium phosphate 
buffer, pH 6.0. The following day the sample was applied 

25 to a DEAE-sephacel column preeguilibrated with the same 
phosphate buffer and the proteins were subsequently 
eluted with a KC1 gradient. Column fractions containing 
the monoclonal antibodies were identified by examination 
of samples on coomassie gels for protein bands typical of 

30 light and heavy chains. 

The isotype of each monoclonal antibody was 
determined by immunodiffusion using the ouchterlony 
method. Immunodiffusion plates were prepared on glass 
slides with 10 ml of 1% DNA-grade agarose (EMC 

35 Bioproducts, Rockland, ME) in phospate-buf fered saline. 
After the agarose solidified, 5-mm wells were punched 
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into the agarose in a circular pattern. The center well 
contained a concentrated preparation of the monoclonal 
antibody being evaluated and the surrounding wells 
contained goat anti-mouse subclass-specif ic antibodies 
5 (Tago). The plates were incubated for 48 hours in a 
humid chamber at 4«c and then examined for white lines of 
immunoprecipitation . 

Hybridoma supernatants which were reactive in the 
_ dot " blot assav described above were examined by Western 
10 blot analysis, both to confirm the reactivity with the 
high molecular weight proteins of the homologous 
nontypable Hgeffipphjjt"s strain and to examine the cross- 
reactivity with similar proteins in heterologous strains. 
Nontypable HaCTWPhims influenzae cell sonicates 
15 containing 100 ng of total protein were solubilized in 
electrophoresis sample buffer, subjected to SDS- 
polyacrylamide gel electrophoresis on 7.5% acrylamide 
gels, and transferred to nitrocellulose using a Genie 
electrophoretic blotter (idea Scientific Company, 
20 Corvallis, OR) for 45 min at 24 V. After transfer, the 
nitrocellulose sheet was blocked and then probed 
sequentially with the hybridoma supernatant, with 
alkaline phosphatase-conjugated goat-anti (mouse igG + 
IgM) second antibody, and finally bound antibodies were 
25 detected by incubation with nitroblue tetrazolium/BCIP 
solution. This same assay was employed to examine the 
reactivity of the monoclonals with recombinant fusion 
proteins expressed in E. coli (see below) . 

In preparation for immunoelectronmicroscopy, 
30 bacteria were grown overnight on supplemented chocolate 
agar and several colonies were suspended in phosphate - 
buffered-saline containing l % albumin. A 20-jil drop of 
this bacterial suspension was then applied to a carbon- 
coated grid and incubated for 2 min. Excess fluid was 
35 removed and the specimen was then incubated for 5 min 
with the purified high molecular weight protein-specific 
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ex*** l^ia and a wash with phosphatebuffered saline 
to 10-™ colloidal gold particles. Allowing 

r"IL I T PhOTt " > "-""«^ saline, the ea^le^ 
rinsed „ lth distilled water, staining of the TcterllT 
cells was perform with o.s, „ra„ yl acetate forT^ 

^scopT - — ta ' — «- 

Fourteen different hybridous were recovw™, which 
produced -onoclonal entities reactive with the pL ^ d 

^nter ° f n ° ntyPable ^UuH^n 

screened «="«nin, assay. of ^ Bonoclonals 

screened by umunoelectron microscopy to dat, . 

:zT d beiw " »*> — ^„ str at.r t o M„r : f . : 

l^re 0 "/ 1010 ^ Strai " 12 ' *" —clonal 

antibodies, designated AD6 f»Tce > .= 

. " lATCC ) and iocs (ATOC 

) . were both of the Igei subclass. 

Example ;n. 

This Exa Bp i e describes the identification of 
surface-exposed B-cell epitopes of high molecular weiaht 
Proteins of non-type^le iUODiisan^. * 

antiboIerVf itOP " reo09niMti * the monoclonal 
antibodies, their reactivity with a pane! of recombinant 

« zi;:r;r rG r by m p^li 

was earned. These plasmids were instructed by clonino 
various segments of the ^ or ^ structures 

(^-o, corporation, Hadison, „, . shoBn in rlg ^ J 

and 19 are the schematic diagrams depictino «... 

derived from k- aepicting the segments 

i-~ sue that i„. £r i ^ r e c zr« :z 

fusion proteins containing pSHffixe-en^ „ 

a-ino acids in the regions indicated by the hatcher" 
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and hmwla or hmw2A encoded amino acids in the regions 
indicated by the black bars in these Figures. A stop 
codon is present at the junction of the black and white 
segments of each bar. 
5 Four discrete sites within the hmwlA structural gene 

were selected as the 5' ends of the hmwl inserts. For 
each 5' end, a series of progressively smaller inserts 
was created by taking advantage of convenient downstream 
- restriction _sites. The _ first, recombinant plasmid 

10 depicted in Figure 18 was constructed by isolating a 4.9 
Jcbp Bam HI-Hindlll fragment from pHMWl-14 (Example l, 
Figure 5A) , which contains the entire hmwl gene cluster 
and inserting it into B^aHI-Hindlll digested pGEMEX«>-l. 
The second recombinant plasmid in this set was 

15 constructed by digesting the "parent" plasmid with 
BstEll -Hind lll, recovering the 6.8 kbp larger fragment, 
blunt-ending with Klenow DNA polymerase, and religating. 
The third recombinant plasmid in this set was constructed 
by digesting the "parent" plasmid with £laT-Hinc|III, 

20 recovering the 6.0 kbp larger fragment, blunt-ending, and 
religating. The next set of four hmwl recombinant 
plasmids was derived from a "parent" plasmid constructed 
by ligating a 2.2 kbp Eco RI fragment from the hmwl gene 
cluster into fissRI -digested pGEMEX*>-2. The other three 

25 recombinant plasmids in this second set were constructed 
by digesting at downstream J£§£EII, fi£oRV, and CXal sites, 
respectively, using techniques similar to those just 
described. The third set of three recombinant plasmids 
depicted was derived from a "parent" plasmid constructed 

30 by double-digesting the first recombinant plasmid 
described above (i.e. the one containing the 4.9 kbp 
BamHI-Hindlll fragment) with gamHI and Clal, blunt- 
ending, and religating. This resulted in a construct 
encoding a recombinant protein with an in-frame fusion at 

35 the Cla l site of the hmwlA gene. The remaining two 
plasmids in this third set were constructed by digesting 
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at downstream Bst EII and Eco RV sites, respectively . 
Finally, the fourth set of two recombinant plasmids was 
derived from a "parent" plasmid constructed by double- 
digesting the original lamHI-fiinglll construct with 
5 ftinc ll and EcoR V, then religating. This resulted in a 
construct encoding a recombinant protein with an in-f rame 
fusion at the EcoR V site of the hmwlA gene. The 
remaining plasmid in this fourth set was constructed by 
digesting at the downstream Bst EII site. 

10 Three discrete sites with the hmw2A structural gene 

were selected as the 5' ends of the hmw2 inserts. The 
first recombinant plasmid depicted in Figure 19 was 
constructed by isolating a 6.0 kbp EcoR I- Xho I fragment 
from pHMW2-21, which contains the entire hmw2 gene 

15 cluster, and inserting it into EcoR I -Sal l digested 
pGEMEX^-l. The second recombinant plasmid in this set 
was constructed by digesting at an Mlu l site near the 3 ' 
end of the hmw2A gene. The second set of two hmw2 
recombinant plasmids was derived from a "parent 11 plasmid 

20 constructed by isolating a 2.3 kbp Hind lll fragment from 
pHMW2-21 and inserting it into Hindi II -digested pGEMEX®- 
2. The remaining plasmid in this second set was 
constructed by digesting at the downstream Mlu l site. 
Finally, the last plasmid depicted was constructed by 

25 isolating a 1.2 kbp Hinc II -Hind lll . fragment from the 
indicated location in the hmw2 gene cluster and inserting 
it into HincII- Hina ill digested pGEKEX®-l. 

Each of the recombinant plasmids was used to 
transform E. coli strain JM101. The resulting 

30 trans fonuants were used to generate the recombinant 
fusion proteins employed in the mapping studies. To 
prepare recombinant proteins, the transformed E. coli 
strains were grown to an A** of 0.5 in L broth containing 
50 fig of ampicillin per ml. IPTG was then added to lmM 

35 and mGPl-2, the M13 phage containing the T7 RNA 
polymerase gene, was added at multiplicity of infection 
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of 10. One hour later, cells were harvested, and a 
sonicate of the cells was prepared. The protein 
concentrations of the samples were determined and cell 
sonicates containing 100 M9 of total protein were 
5 solubilized in electrophoresis sample buffer, subjected 
to SDS-polyacrylamide gel electrophoresis, and examined 
on Coomassie gels to assess the expression level of 
recombinant fusion proteins. Once high levels of 
expression of the recombinant fusion proteins were 

10 confirmed, the cell sonicates were used in the Western 
blot analyses described above. 

Shown in Figure 20 is an electron micrograph 
demonstrating surface binding of Mab AD6 to 
representative nontypable Haemophilus influenzae strains. 

15 In the upper left panel of the Figure is nontypable 
Haemophilus strain 12 and in the upper right panel is a 
strain 12 derivative which no longer expressed the high 
molecular weight proteins. As can be seen, colloidal 
gold particles decorate the surface of strain 12, 

20 indicating bound AD6 antibody on the surface. In 
contrast, no gold particles are evident on the surface of 
the strain 12 mutant which no longer expresses the high 
molecular weight proteins. These results indicate that 
monoclonal antibody AD6 is recognizing a surface-exposed 

25 epitope on the high molecular weight proteins of strain 
12. Analogous studies were performed with monoclonal 
antibody 10C5 demonstrating it too bound to surface- 
accessible epitopes on the high molecular weight HMW1 and 
HMW2 proteins of strain 12. 

30 Having identified two surface-binding monoclonals, 

the epitope which each monoclonal recognized was mapped. 
To accomplish this task, the two sets of recombinant 
plasmids containing various portions of either the hmw^a 
or hmw2A structural genes (Figures 18 and 19) were 

35 employed. With these complementary sets of recombinant 
plasmids, the epitopes recognized by the monoclonal 
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antibodies were mapped to relatively small regions of the 
very large HMW1 and HMW2 proteins* 

To localize epitopes recognized by Mab AD6, the 
pattern of reactivity of this monoclonal antibody with a 
5 large set of recombinant fusion protein was examined. 
Figure 21 is a Western blot which demonstrates the 
pattern of reactivity of Mab AD6 with five recombinant 
fusion proteins, a relevant subset of the larger number 
_ _ originally examined. From analysis of the pattern of 

10 reactivity of Mab AD6 with this set of proteins, one is 
able to map the epitope it recognizes to a very short 
segment of the HMW1 and HMW2 proteins. A brief summary 
of this analysis follows. For reference, the relevant 
portions of the hrowlA or hmw2A structural genes which 

15 were expressed in the recombinant proteins being examined 
are indicated in the diagram at the top of the figure. 
As shown in lane l, Mab AD6 recognizes an epitope encoded 
by fragment 1, a fragment which encompasses the distal 
one-fourth of the hmwlA gene. Reactivity is lost when 

20 only the portion of the gene comprising fragment 2 is 
expressed* This observation localizes the AD6 epitope 
somewhere within the last 180 amino acids at the carboxy- 
terminal end of the HMWl protein. Mab AD6 also 
recognizes an epitope encoded by fragment 3 , derived from 

25 the hmw2A structural gene. This is a rather large 
fragment which encompasses nearly one-third of the gene. 
Reactivity is lost when fragment 4 is expressed. The 
only difference between fragments 3 and 4 is that the 
last 225 base pairs at the 3' end of the hmw2A structural 

30 gene were deleted in the latter construct. This 
observation indicates that the AD6 epitope is encoded by 
this short terminal segment of the hmw2A gene. Strong 
support for this idea is provided by the demonstrated 
binding of Mab AD6 to the recombinant protein encoded by 

35 fragment 5, a fragment encompassing the distal one-tenth 
of the hmw2A structural gene. Taken together, these data 
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identify the AD6 epitope as common to both the HMW1 and 
HMW2 proteins and place its location with 75 amino acids 
of the carboxy termini of the two proteins. 

Figure 22 is a Western blot demonstrating the 
5 pattern of reactivity of Mab 10C5 with the same five 
recombinant fusion proteins examined in Figure 21 ♦ As 
shown in lane 1, Mab IOCS recognizes an epitope encoded 
by fragment i. In contrast to Mab AD6, Mab 10C5 also 
recognizes an epitope encoded by fragment 2. Also in 

10 contrast to Mab AD6, Mab 10C5 does not recognize any of 
the hmw2A -derived recombinant fusion proteins. Thus, 
these data identify the 10C5 epitope as being unique to 
the HMW1 protein and as being encoded by the fragment 
designated as fragment 2 in this figure. This fragment 

15 corresponds to a 155-amino acid segment encoded by the 
Eco RV- Bst EII segment of the hrowlA structural gene* 

Having identified the approximate locations of the 
epitopes on HMW1 and HMW2 recognized by the two 
moncclonals, the extent to which these epitopes were 

20 shared by the high molecular weight proteins of 
heterologous nontypable Haemophilus strains was next 
determined. When examined in Western blot assays with 
bacterial cell sonicates, Mab AD6 was reactive with 
epitopes expressed on the high molecular weight proteins 

25 of 75% of the inventor's collection of more than 125 
nontypable Haemophilu s influenzae strains. In fact, this 
monoclonal appeared to recognize epitopes expressed on 
high molecular weight proteins in virtually all 
nontypable Haemophilus strains which we previously 

30 identified as expressing HMWl/HMW2-like proteins. Figure 
23 is an example of a Western blot demonstrating the 
reactivity of Mab AD6 with a representative panel of such 
heterologous strains. As can be seen, the monoclonal 
antibody recognizes one or two bands in the 100 to 150 

35 kDa range in each of these strains. For reference, the 
strain shown in lane 1 is prototype strain 12 and the two 
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bands visualized represent HMWl and HMW2 as the upper and 
lower immunoreactive bands, respectively. 

In contrast to the broad cross-reactivity observed 
with Mab AD6, Mab 10C5 was much more limited in its 
ability to recognize high molecular weight proteins in 
heterologous strains. Hab 10C5 recognized high molecular 
weight proteins in approximately 40% of the strains which 
expressed HMWl/HMW2-like proteins. As was the case with 
Ma> AD6 , Mab 10C5 did not recognize proteins in any the 
nontypable Haemophilus strains which did not express 
HMWl/HMW2-like proteins. 

In a limited fashion, the reactivity of Mab AD6 with 
surface-exposed epitopes on the heterologous strains has 
been examined, m the bottom two panels of Figure 20 are 
electron micrographs demonstrating the reactivity of Mab 
AD6 with surface-accessible epitopes on nontypable 
Haemophilus strains 5 and 15. As can be seen, abundant 
colloidal-gold particles are evident on the surfaces of 
each of these strains, confirming their surface 
expression of the AD6 epitope. Although limited in 
scope, these data suggest that the AD6 epitope may be a 
common surface-accessible epitope on the high molecular 
weight adhesion proteins of most nontypable Haemophilus 
influenzae which express HMWl /HMW2- like proteins. 



SUMMARY OP m^rj-nfiTipir 
In summary of this disclosure, the present invention 
provides high molecular weight proteins of non-typeable 
Haemophilus , genes coding for the same and vaccines 
30 incorporating such proteins. Modifications are possible 
within the scope of this invention. 
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TABLE 1 ; Effect of mutation of high molecular weight 
proteins on adherence to Chang epithelial cells by 
nontypable H. influenzae. 





ADHERENCE % * 


Strain 


% Inoculation 


Relative to 


Strain 12 derivatives 
wild type 


87.76 ± 5.9 


100.0 ± 6.7 


HMWr mutant 


6.0 ± 0.9 


6.8 ± 1.0 


HMW2" mutant 


89.9 ± 10.8 


102.5 ± 12.3 


HMW17HMW2" mutant 


2.0 ± 0.3 


2.3 ± 0.3 


Strain 5 derivatives 
wild type 


78,7 ± 3.2 


100.0 ± 4.1 


HMWl-like mutant 


15.7 ± 2.6 


19.9 ± 3.3 


HMW2-like mutant 


103.7 ±14.0 


131.7 ± 17.8 


double mutant 


3.5 ± 0.6 


4.4 ± 0.8 



* Numbers represent mean (± standard error of the mean) 
of measurements in triplicate or quadruplicate from 
representative experiments. 



t Adherence values for strain 12 derivatives are 
relative to strain 12 wild type; values for strain 5 
derivatives are relative to strain 5 wild type. 
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TABLED: Adherence by E. coli DH5a and HB101 harboring 
hml or hmr2 gene clusters. 



Strain* 



Adherence relative to H. 
influenzae strain 12 t 



DH5a (pT7-7) 



0.7 ± 0.02 



DH5a (pHMWl-14) 



114.2 ± 15.9 



DH5a (pHMW2-21) 
HB101 (pT7-7) 



14.0 ± 3.7 
1.2 ± 0.5 



HB101 (pHMWl-14) 
HB101 (pHMW2-21) 



93.6 ± 15.8 
3.6 ± 0.9 



The plasmd pHMWl-14 contains the Wl gene cluster 
whxle PHMW2-21 contains the nmr2 gene cluster; pT 7-7 is 
the cloning vector used in these constructs. 

t Numbers represent the mean (± standard error of the 
Bean, of Measurements made in triplicate from 
representative experiments. 
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TABLE 3 : Protective ability of HMW protein against non 
typeable J5T. influ&nzae challenge in chinchilla model 



Group 


— Antigens - 


Total 
Animals 


Number of Animals Showed 
Positive Ear Infection 


(#) 






Tympano- 
gnun 


Otosco- 
pic 

Examin- 
ation 


cfu of 
Bacteria 
nOfiL 


1 


HMW 


5 


0 


0 


0 


2 


None 


5 


5 


5 


850- 
3200 
(4/5) 


3 


Convalescent 


4 


0 


0 


0 
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ACAGCGTTCT CTTAATACTA GTACAAACCC ACAATAAAAT ATGACAAACA ACAATTACAA 
CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAATA GTATAAATOC GOCATATAAA 
ATGGTATAAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC ATCTTTCATC 
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TTTCATCTTT CATCTTTCAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC 240 

ACATGCCCTG ATCAACCGAG GGAAGGGAGG GAGGGGCAAG AATGAAGAGG GAGCTGAACG 300 

AACGCAAATG ATAAAGTAAT TTAATTGTTC AACTAACCTT AGGAGAAAAT ATGAACAAGC 360 

TATATCGTCT CAAATTCAGC AAACGCCTGA ATGCTTTGGT TGCTGTGTCT GAATTGGCAC 420 

GGGGTTGTGA CCATTCCACA GAAAAAGGCA GCGAAAAACC TGCTCGCATG AAAGTGCGTC 480 

ACTTAGCGTT AAAGCCACTT TCCGCTATGT TACTATCTTT AGGTGTAACA TCTATTCCAC 540 

AATCTGTTTT AGCAAGCGGC TTACAAGGAA TGGATGTAGT ACACGGCACA GCCACTATGC 600 

AAGTAGATGGTAATAAAACC ATTATCCGCA ACAGTGTTGA CGATATCATT AATTGGAAAG 660 - 

AATTTAACAT CGACCAAAAT GAAATGGTGC AGTTTTTACA AGAAAACAAC AACTCCGCCG 720 

TATTCAACCG TCTTACATCT AACCAAATCT CCCAATTAAA AGGOATTTTA GATTCTAACG 780 

GACAAGTCTT TTTAATCAAC CCAAATGOTA TCACAATAGG TAAAGACGCA ATTATTAACA 840 

CTAATGGCTT TACGGCTTCT ACGCTAGACA TTTCTAACGA AAACATCAAG OCGCGTAATT 900 

TCACCTTCGA GCAAACCAAA GATAAAGCGC TCGCTGAAAT TGTGAATCAC GGTTTAATTA 960 

CTGTOGGTAA AGACGGCAGT GTAAATCTTA TMGTOGCAA AGTGAAAAAC GAGGGTGTGA 1020 

TTAGCGTAAA TGGTGGCAGC ATTTCTTTAC TCGCAGGGCA AAAAATCACC ATCAGCGATA 1080 

TAATAAACCC AACCATTACT TACAGCATTG CCGOGCCTGA AAATGAAGCG GTCAATCTGG 1140 

GCGATATTTT TGCCAAAGGC GGTAACATTA ATGTCCGTGC TGCCACTATT CGAAACCAAG 1200 

GTAAACTTTC TGCTGATTCT GTAAGCAAAG ATAAAAGCGG CAATATTGTT CTTTCCGCCA 1260 

AAGAGGGTGA AGCGGAAATT GGOGGTGTAA TTTCCGCTCA AAATCAGCAA GCTAAAGGCG 1320 

GCAAGCTGAT GATTACAGGC GATAAAGTCA CATTAAAAAC AGGTGCAGTT ATCGACCTTT 1380 

CAGGTAAAGA AGGGGGAGAA ACTTACCTTG GCGGTGACGA GCGCGGCGAA GGtAAAAAGG 1440 

GCATTCAATT AGCAAAGAAA ACCTCTTTAG AAAAAGGCTC AACCATCAAT GTATCAGGCA 1500 

AAGAAAAAGG CGGACGCGCT ATTGTGTGGG GOGATATTGC GTTAATTGAC GGCAATATTA 1560 

AOGCTCAAGG TAGTGGTGAT ATCGCTAAAA CCGGTGGTTT TGTCGAOACG TCGGGGCATG 1620 

ATTTATTCAT CAAAGACAAT GCAATTGTTG ACGCCAAAGA GTGGTTGTTA GACCCGGATA 1680 

ATGTATCTAT TAATGCAGAA ACAGCAGGAC GCAGCAATAC TTCAGAAGAC GATGAATACA 1740 

CGGGATCOGG GAATAGTGCC AGCACCCCAA AACGAAACAA AGAAAAGACA ACATTAACAA 1800 

ACACAACTCT TOAGAGTATA CTAAAAAAAG CTACCTTTOT TAACATCACT GCTAATCAAC 1860 

GCATCTATGT CAATAGCTCC ATTAATTTAT CCAATGGCAG CTTAACTCTT TGGAGTGAGG 1920 

GTCGGAGCGG TGGCGGOGTT GAGATTAACA ACGATATTAC CACCGGT6AT GATACCAGAG 1980 

GTGCAAACTT AACAATTTAC TCAGGCGGCT GGGTTGATGT TCATAAAAAT ATCTCACTCG 2040 

GGGCGCAAGG TAACATAAAC ATTACAGCTA AACAAGATAT CGCCTTTGAG AAAGGAAGCA 2100 

ACCAAGTCAT TACAGGTCAA GGGACTATTA CCTCAGGCAA TCAAAAAGGT TTTAGATTTA 2160 

ATAATGTCTC TCTAAACGGC ACTGGCAGCG GACTGCAATT CACCACTAAA AGAACCAATA 2220 
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AATACGCTAT CACAAATAAA TTTGAAGGGA CTTTAAATAT TTCAGGGAAA GTGAACATCT 2280 

CAATCGTTTT ACCTAAAAAT GAAAGTGGAT ATGATAAATT CAAAGGACGC ACTTACTGGA 2340 

ATTTAACCTC CTTAAATGTT TCCGAGAGTQ GCGAGTTTAA CCTCACTATT GACTCCAGAG 2400 

GAAGCGATAG TGCAGGCACA CTTACCCAGC CTTATAATTT AAACGGTATA TCATTCAACA 2460 

AAGACACTAC CTTTAATGTT GAACGAAATG CAAGAGTCAA CTTTGACATC AAGGCACCAA 2S20 

TAGGGATAAA TAAGTATTCT AGTTTGAATT ACGCATCATT TAATGGAAAC ATTTCAGTTT 2580 

CGGGAGGGGG GAGTGTTGAT TTCACACTTC TCGCCTCATC CTCTAACGTC CAAACCCCCG 2640 

GTGTAGTTAT _ AAATTGTAAA TACTCTAATG. TTTGAACAGG. GTCAAGTTTA AGATTTAAAA 2700 

CTTCAGGCTC AACAAAAACT GGCTTCTCAA TAGAGAAAGA TTTAACTTTA AATGCGACCG 2760 

GAGGCAACAT AACACTTTTG CAAGTTGAAG GCACCGATGG AATGATTGGT AAAGGCATTG 2820 

TAGCCAAAAA AAACATAACC TTTGAAGGAG GTAACATCAC CTTTGGCTCC AGGAAAGCCG 2880 

TAACAGAAAT CGAAGGCAAT GTTACTATCA ATAACAAOGC TAACGTCACT CTTATCGGTT 2940 

CGGATTTTGA CAACCATCAA AAACCTTTAA CTATTAAAAA AGATGTCATC ATTAATAGCG 3000 

GCAACCTTAC CGCTGGAGGC AATATTGTCA ATATAGCCGG AAATCTTACC GTTGAAAGTA 3060 

ACGCTAATTT CAAAGCTATC ACAAATTTCA CTTTTAATGT AGGCGGCTTG TTTGACAACA 3120 

AAGGCAATTC AAATATTTCC ATTGCCAAAG GAGGGGCTCG CTTTAAAGAC ATTGATAATT 3180 

CCAAGAATTT AAGCATCACC ACCAACTCCA GCTCCACTTA CCGCACTATT ATAAGCGGCA 3240 

ATATAACCAA TAAAAACGGT GATTTAAATA TTACGAACGA AGGTAGTGAT ACTGAAATGC 3300 

AAATTGGCGG CGATGTCTCG CAAAAAGAAG GTAATCTCAC GATTTCTTCT GACAAAATCA 3360 

ATATTACCAA ACAGATAACA ATCAAGGCAG GTGTTGATGG GGAGAATTCC GATTCAGACG 3420 

CGACAAACAA TGCCAATCTA ACCATTAAAA CCAAAGAATT GAAATTAACG CAAGACCTAA 3480 

ATATTTCAGG TTTCAATAAA GCAGAGATTA CAGCTAAAGA TGGTAGTGAT TTAACTATTG 3540 

GTAACACCAA TAGTGCTGAT GGTACTAATG CCAAAAAAGT AACCTTTAAC CAGGTTAAAG 3600 

ATTCAAAAAT CTCTGCTGAC GGTCACAAGG TGACACTACA CAGCAAAGTG GAAACATCCG 3660 

GTAGTAATAA CAACACTGAA GATAGCAGTG ACAATAATGC CGGCTTAACT ATCGATGCAA 3720 

AAAATGTAAC AGTAAACAAC AATATTACTT CTCACAAAGC AGTGAGCATC TCTGCGACAA 3780 

GTGGAGAAAT TACCACTAAA ACAGGTACAA CCATTAACGC AACCACTGGT AACGTGGAGA 3840 

TAACCGCTCA AACAGGTAGT ATCCTAGOTG GAATTQAGTC CAGCTCTGGC TCTGTAACAC 3900 

TTACTGCAAC CGAGGGCOCT CTTGCTGTAA GCAATATTTC GGGCAACACC GTTACTGTTA 3960 

CTGCAAATAG CGGTGCATTA ACCACTTTGG CAGGCTCTAC AATTAAAGGA ACCGAGAGTG 4020 

TAACCACTTC AAGTCAATCA GGCGATATCG GCGGTACGAT TTCTGGTGGC ACAGTAGAGG 4080 

TTAAAGCAAC OGAAAGTTTA ACCACTCAAT CCAATTCAAA AATTAAAGCA ACAACAGGCG 4140 

AGGCTAACGT AACAAGTGCA ACAGGTACAA TTGGTGGTAC GATTTCCGGT AATACGGTAA 4200 

ATGTTACGGC AAACGCTGGC GATTTAACAG TTGGGAATGG CGCAGAAATT AATGCGACAG 4260 
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AGACGCTGAG 
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AATCACAATA 
AGGCGTTAAA 
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AATACAATAA 
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GGAGTAAGTG 

GAATTTGCAA 
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TGGGTTAAAG 
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AAGGAGCTGC AACCTTAACT ACATCATCGG GCAAATTAAC 
TTACTTCAGC CAAGGGTCAG GTAAATCTTT CAGCTCAGGA 
TTAATGCCGC CAATGTGACA CTAAATACTA CAGGCACTTT 
ACATTAATGC AACCAGCGGT ACCTTGGTTA TTAACGCAAA 
CAGCATTGGG TAACCACACA GTGGTAAATG CAACCAACGC 
TCGCGACAAC CTCAAGCAGA GTGAACATCA CTGGGGATTT 
ATATCATTTC AAAAAACGGT ATAAACACCG TACTGTTAAA 
AATAGATTCA ACCGGGTATA„ QCAAGCGTAG ATGAAGTAAT 
AGAAGGTAAA AGATTTATCT GATGAAGAAA GAGAAGCGTT 
CTGTACGTTT TATTGAGCCA AATAATACAA TTACAGTCGA 
CCAGACCATT AAGTCGAATA GTGATTTCTG AAGGCAGGGC 
GCGCGACGGT GTGCGTTAAT ATCGCTGATA ACGGGCGGTA 
TAGATTTCAT CCTGCAATGA AGTCATTTTA TTTTCGTATT 
TTCAGTACGG GCTTTACCCA TCTTGTAAAA AATTACGGAG 

ACAGGTTATT ATTATG 

(2) INFORMATION FOR SEQ ID NO: 2; 

(i) SEQUENCE CHARACTERISTICS : 

(A) U2NGTH: 1536 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Aim Ala Leu 
1 5 

Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp Hi0 Ser Thr Glu Lys 
20 25 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lys 
J 35 40 " 

Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser He Pro Gin 

50 55 60 

Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Gly Thr 

€5 70 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr lie lie Arg Aan Ser Val 

Asp Ala lie lie Asn Trp Lys Gin Phe Asn lie Asp Gin Asn Glu Met 

100 105 
val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 

115 120 



4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5116 
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Thr Ser Asn Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Glv 
130 135 140 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 150 155 X60 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lvs 
"o ias iso 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 



Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

lie Asn Val Arg Ala Ala Thr He Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Ser Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lvs 
370 375 300 

Glu Lys Gly Gly A*g Ala He Val Trp Gly Asp He Ala Leu He Asp 
385 390 395 400 

Gly Asn lie Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Glv 
405 410 415 

Phe Val Glu Thr Ser Gly His Asp Leu Phe He Lys Asp Asn Ala He 
420 425 430 

Val Asp Ala Lys Glu Trp Leu Leu Asp Phe Asp Asn Val Ser He Asn 
435 440 445 

Ala Glu Thr Ala Gly Arg Ser Asn Thr Ser Glu Asp Asp Glu Tyr Thr 
450 455 460 

Gly Ser Gly Asn Ser Ala Ser Thr Pro Lys Arg Asn Lys Glu Lys Thr 
465 470 475 480 
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Thr Leu Thr Asn Thr Thr Leu Glu Ser lie Leu Lys Lys Gly Thr Phe 
4B5 490 495 

Val Asn lie Thr Ala Asn Gin Arg lie Tyr Val Asn Ser Ser lie Asn 
500 505 510 

Leu Ser Asn Gly Ser Leu Thr Leu Trp Ser Glu Gly Arg Ser Gly Gly 
515 520 525 

Gly Val Glu lie Asn Asn Asp lie Thr Thr Gly Asp Asp Thr Arg Gly 
530 535 540 

Ala Asn Leu Thr lie Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn 
545 550 "555 " " 560 

lie Ser Leu Gly Ala Gin Gly Asn He Asn lie Thr Ala Lys Gin Asp 
565 570 575 

He Ala Phe Glu Lys Gly Ser Asn Gin Val He Thr Gly Gin Gly Thr 
580 585 590 

lie Thr Ser Gly Asn Gin Lys Gly Phe Arg Phe Asn Asn Val Ser Leu 
595 600 605 

Asn Gly Thr Gly Ser Gly Leu Gin Phe Thr Thr Lys Arg Thr Asn Lys 
610 615 620 

Tyr Ala He Thr Asn Lys Phe Glu Gly Thr Leu Asn He Ser Gly Lys 
625 630 635 640 

Val Asn He Ser Met Val Leu Pro Lys Asn Glu Ser Gly Tyr Asp Lys 
645 650 655 

Phe Lys Gly Arg Thr Tyr Trp Asn Leu Thr Ser Leu Asn Val Ser Glu 
660 665 670 

Ser Gly Glu Phe Asn Leu Thr He Asp Ser Arg Gly Ser Asp Ser Ala 
675 680 685 

Gly Thr Leu Thr Gin Pro Tyr Asn Leu Asn Gly He Ser Phe Asn Lys 
690 695 700 

Asp Thr Thr Phe Asn Val Glu Arg Asn Ala Arg Val Asn Phe Asp He 
705 710 715 720 

Lys Ala Pro He Gly He Asn Lys Tyr Ser Ser Leu Asn Tyr Ala Ser 
725 730 735 

Phe Asn Gly Asn He Ser Val Ser Gly Gly Gly Ser Val Asp Phe Thr 
740 745 750 

Leu Leu Ala Ser Ser Ser Asn Val Gin Thr Pro Gly Val Val He Asn 
755 760 765 

Ser Lys Tyr Phe Asn Val Ser Thr Gly Ser Ser Leu Arg Phe Lys Thr 
770 775 780 

Ser Gly Ser Thr Lys Thr Gly Phe Ser He Glu Lys Asp Leu Thr Leu 
785 790 795 BOO 

Asn -Ala Thr Gly Gly Asn He Thr Leu Leu Gin Val Glu Gly Thr Asp 
805 810 815 

Gly Met He Gly Lys Gly He Val Ala Lys Lys Asa He Thr Phe Glu 
820 825 830 
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Gly Gly Asn He Thr Phe Gly Ser Arg Lys Ala Val Thr Glu He Glu 
835 640 845 

Gly Asn Val Thr He Asn Asn Asn Ala Asn Val Thr Leu He Gly Ser 
850 855 860 

Asp Phe Asp Asn His Gin Lys Pro Leu Thr He Lys Lys Asp Val He 
865 870 B75 880 

He Asn Ser Gly Asn Leu Thr Ala Gly Gly Asn He Val Asn He Ala 
685 890 695 

Gly Asn Leu Thr Val Glu Ser Asn Ala Asn Phe Lys Ala He Thr Asn 
900 905 910 

Phe Thr Phe Asn Val Gly Gly Leu Phe Asp Asn Xys Gly Asn Ser Asn 
915 920 925 

He Ser He Ala Lys Gly Gly Ala Arg Phe Lys Asp He Asp Asn Ser 
930 935 940 

Lys Asn Leu Ser He Thr Thr Asn Ser Ser Ser Thr Tyr Arg Thr He 
945 950 955 960 

He Ser Gly Asn He Thr Asn Lys Asn Gly Asp Leu Asn He Thr Asn 
965 970 975 

Glu Gly Ser Asp Thr Glu Met Gin He Gly Gly Asp Val Ser Gin Lys 
980 985 990 

Glu Gly Asn Leu Thr He Ser Ser Asp Lys He Asn He Thr Lys Gin 
995 1000 1005 

He Thr He Lys Ala Gly Val Asp Gly Glu Asn Ser Asp Ser Asp Ala 
1010 1015 1020 

Thr Asn Asn Ala Asn Leu Thr He Lys Thr Lys Glu Leu Lys Leu Thr 
1025 1030 1035 1040 

Gin Asp Leu Asn He Ser Gly Phe Asn Lys Ala Glu He Thr Ala Lys 
1045 1050 1055 

Asp Gly Ser Asp Leu Thr He Gly Asn Thr Asn Ser Ala Asp Gly Thr 
1060 1065 1070 

Asn Ala Lys Lys Val Thr Phe Asn Gin Val Lys Asp Ser Lys He Ser 
1075 X080 1065 

Ala Asp Gly His Lys Val Thr Leu His Ser Lys Val Glu Thr Ser Gly 
1090 1095 1100 

Ser Asn Asn Asn Thr Glu Asp Ser Ser Asp Asn Asn Ala Gly Leu Thr 
1105 1110 HIS 1120 

He Asp Ala Lys Asn Val Thr Val Asn Asn Asn He Thr Ser His Lys 
1125 1130 1135 

Ala Val Ser He Ser Ala Thr Ser Gly Glu lie Thr Thr Lys Thr Gly 
1140 1145 1150 

Thr Thr He Asn Ala Thr Thr Gly Asn Val Glu lie Thr Ala Gin Thr 
1155 1160 11« 

Gly Ser He Leu Gly Gly He Glu Ser Ser Ser Gly Ser Val Thr Leu 
1170 1175 1180 
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Thr Ala Thr Glu Gly Ala Leu Ala Val Ser Asn He Ser Gly Asn Thr 
1185 1190 1195 1200 

Val Thr Val Thr Ala Asn Ser Gly Ala Leu Thr Thr Leu Ala Gly Ser 
1205 1210 1215 

Thr lie Lys Gly Thr Glu Ser Val Thr Thr Ser Ser Gin Ser Gly Asp 
1220 1225 1230 

He Gly Gly Thr He Ser Gly Gly Thr Val Glu Val Lys Ala Thr Glu 
1235 1240 1245 

Ser Leu Thr Thr Gin Ser Asn Ser Lys He Lys Ala Thr Thr Gly Glu 
1250 1255 1260 



Ala Asn Val Thr Ser Ala Thr Gly Thr He Gly Gly Thr He Ser Gly 
1265 1270 1275 1280 

Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu Thr Val Gly Asn 
1285 1290 1295 

Gly Ala Glu He Asn Ala Thr Glu Gly Ala Ala Thr Leu Thr Thr Ser 
1300 1305 1310 

Ser Gly Lys Leu Thr Thr Glu Ala Ser Ser His He Thr Ser Ala Lys 
1315 1320 1325 

Gly Gin Val Asn Leu Ser Ala Gin Asp Gly Ser Val Ala Gly Ser He 
1330 1335 1340 

Asn Ala Ala Asn Val Thr Leu Asn Thr Thr Gly Thr Leu Thr Thr Val 
X345 1350 1355 1360 

Lys Gly Ser Asn lie Asn Ala Thr Ser Gly Thr Leu Val He Asn Ala 

1365 1370 1375 

Lys Asp Ala Glu Leu Asn Gly Ala Ala Leu Gly Asn His Thr Val Val 
1380 13B5 1390 

Asn Ala Thr Asn Ala Asn Gly ser Gly Ser Val He Ala Thr Thr Ser 
1395 1400 1405 

Ser Arg Val Asn He Thr Gly Asp Leu He Thr He Asn Gly Leu Asn 
1410 1415 1420 

He He Ser Lys Asn Gly He Asn Thr Val Leu Leu Lys Gly Val Lys 
1425 1430 1435 1440 

He Asp Val Lys Tyr He Gin Pro Gly He Ala Ser Val Asp Glu Val 
1445 1450 1455 

He Glu Ala Lys Arg He Leu Glu Lys Val Lys Asp Leu Ser Asp Glu 
1460 1465 1470 

Glu Arg Glu Ala Leu Ala Lys Leu Gly Val Ser Ala Val Arg Phe He 
1475 1480 1485 

Glu Pro Asn Asn Thr He Thr Val Asp Thr Gin Asn Glu Phe Ala Thr 
1490 1495 1500 

Arg Pro Leu Ser Arg He Val He Ser Glu Gly Arg Ala Cys Phe Ser 
1505 1510 1515 X520 

Asn Ser Asp Gly Ala Thr Val Cys Val Asn He Ala Asp Asn Gly Arg 
1525 1530 1535 
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(2) INFORMATION FOR SEQ ID NO; 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4937 base pairs 

(B) TYPBt nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

lx±) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TAAATATACA AGATAATAAA AATAAATCAA GATTTTTGTG ATGACAAACA ACAATTACAA €0 

CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAAAT AGTATAAATC CGCCATATAA 120 

AATGGTATAA TCTTTCATCT TTCATCTTTA ATCTTTCATC TTTCATCTTT CATCTTTCAT 160 

CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC ATCTTTCATC TTTCATCTTT 240 

CACATGAAAT GATGAACCGA GGGAAGGGAG GGAGGGGCAA GAATGAAGAG GGAGCTGAAC 300 

GAACGCAAAT GATAAAGTAA TTTAATTGTT CAACTAACCT TAGGAGAAAA TATGAACAAG 360 

ATATATCGTC TCAAATTCAG CAAAOGCCTG AATGCTTTGG TTGCTGTGTC TGAATTGGCA 420 

CGGGGTTGT3 ACCATTCCAC AGAAAAAGGC TTCCGCTATG TTACTATCTT TAGGTGTAAC 480 

CACTTAGCGT TAAAGCCACT TTCCGCTATG TTACTATCTT TAGGTGTAAC ATCTATTCCA 540 

CAATCTGTTT TAGCAAGCGG CTTACAAGGA ATGGATGTAG TACACGGCAC AGCCACTATG 600 

CAAGTAGATG GTAATAAAAC CATTATCCGC AACAGTGTTG ACGCTATCAT TAATTGGAAA 660 

CAATTTAACA TCGACCAAAA TGAAATGGTG CAGTTTTTAC AAGAAAACAA CAACTCCGCC 720 

GTATTCAACC GTGTTACATC TAACCAAATC TCCCAATTAA AAGGGATTTT AGATTCTAAC 780 

GGACAAGTCT TTTTAATCAA CCCAAATGGT ATCACAATAG GTAAAGACGC AATTATTAAC 840 

ACTAATGGCT TTACGGCTTC TACGCTAGAC ATTTCTAACG AAAACATCAA GGCGCGTAAT 900 

TTCACCTTCG AGCAAACCAA AGATAAAGCG CTCGCTGAAA TTGTGAATCA CGGTTTAATT 960 

ACTGTCGGTA AAGACGGCAG TGTAAATCTT ATTGGTGGCA AAGTGAAAAA CGAGGGTGTG 1020 

ATTAGCGTAA ATGGTGGCAG CATTTCTTTA CTCGCAGGGC AAAAAATCAC CATCAGCGAT 1080 

ATAATAAACC CAACCATTAC TTACAGCATT GCOGCGCCTG AAAATGAAGC GGTCAATCTG 1140 

GGCGATATTT TTGCCAAAGG CGGTAACATT AATGTCCGTG CTGOCACTAT TCGAAACCAA 1200 

GGTAAACTTT CTGCTGATTC TGTAAGCAAA GATAAAAGCG GCAATATTGT TCTTTCCGCC 1260 

AAAGAGGGTG AAGCGGAAAT TGGCGGTGTA ATTTCCGCTC AAAATCAGCA AGCTAAAGGC 1320 

GGCAAGCTGA TGATTACAGG CGATAAAGTC ACATTAAAAA CAGGTGCAGT TATCGACCTT 1380 

TCAGGTAAAG AAGGGGGAOA AACTTACCTT GGCGGTGACG AGCGCGGCGA AGGTAAAAAC 1440 

GGCATTCAAT TAGCAAAGAA AACCTCTTTA GAAAAAGGCT CAACCATCAA TGTATCAGGC 1500 

AAAGAAAAAG GCGGACGCGC TATTGTGTGG GGCGATATTG CGTTAATTGA CGGCAATATT 1560 

AACGCTCAAG GTAGTGGTGA TATCGCTAAA ACCGGTGGTT TTGTGGAGAC ATCGGGGCAT 1620 

TATTTATCCA TTGACAGCAA TGCAATTCTT AAAACAAAAG AGTGGTTGCT AGACCCTGAT 1680 
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GATGTAACAA TTGAAGCCGA AGACCCCCTT CGCAATAATA CCGGTATAAA TGATGAATTC 
CCAACAGGCA CCGGTGAAGC AAGCGACCCT AAAAAAAATA GCGAACTCAA AACAACGCTA 
ACCAATACAA CTATTTCAAA TTATCTGAAA AACGCCTCGA CAATGAATAT AACGGCATCA 
AGAAAACTTA CCGTTAATAG CTCAATCAAC ATCGGAAGCA ACTCCCACTT AATTCTCCAT 
AGTAAAGGTC AGCGTGGCGG AGGCGTTCAG ATTGATGGAG ATATTACTTC TAAAGGCGGA 
AATTTAACCA TTTATTCTGG CGGATGGGTT GATGTTCATA AAAATATTAC GCTTGATCAG 
GGTTTTTTAA ATATTACCGC CGCTTCCGTA GCTTTTGAAG GTGGAAATAA CAAAGCACGC 
GACGCGGCAA ATGCTAAAAT TGTCGCCCAG GGCACTGTAA CCATTACAGG AGAGGGAAAA 
GATTTCAGGG CTAACAACGT ATCTTTAAAC GGAACGGOTA AAGGTCTGAA TATCATTTCA 
TCAGTGAATA ATTTAAOCCA CAATCTTAGT GGCACAATTA ACATATCTGG GAATATAACA 
ATTAACCAAA CTACGAGAAA GAACACCTCG TATTGGCAAA CCAGCCATGA TTCGCACXGG 
AACGTCAGTG CTCTTAATCT AGAGACAGGC GCAAATTTTA CCTTTATTAA ATACATTTCA 
AGCAATAGCA AAGGCTTAAC AACACAGTAT AGAAGCTCTG CAGGGGTGAA TTTTAACOGC 
GTAAATGGCA ACATOTCATT CAATCTCAAA GAAGGAGCGA AAGTTAATTT CAAATTAAAA 
CCAAACGAGA ACATGAACAC AAGCAAACCT TTACCAATTC GGTTTTTAGC CAATATCACA 
GCCACTGOTG GGGGCTCTGT TTTTTTTGAT AXATATGCCA ACCATTCTGG CAGAGGGGCT 
GAGTTAAAAA TGAGTGAAAT TAATATCTCT AACGGCGCTA ATTTTACCTT AAATTCCCAT 
GTTCGCGGCG ATGACGCTTT TAAAATCAAC AAAGACTTAA CCATAAATGC AACCAATTCA 
AATTTCAGCC TCAGACAGAC GAAAGATGAT TTTTATGACG GGTACGCACG CAATGCCATC 
AATTCAACCT ACAACATATC CATTCTGGGC GGTAATGTCA CCCTTGGTGG ACAAAACTCA 
AGCAGCAGCA TTACGGGGAA TATTACTATC GAGAAAGCAG CAAATGTTAC GCTAGAAGCC 
AATAACGCCC CTAATCAGCA AAACATAAGG GATAGAGTTA TAAAACTTGG CAGCTTGCTC 
GTTAATGGGA GTTTAAGTTT AACTGGCGAA AATGCAGATA TTAAAGGCAA TCTCACTATT 
TCAGAAAGCG CCACTTTTAA AGGAAAGACT AGAGATACCC TAAATATCAC CGGCAATTTT 
ACCAATAATG GCACTGCCGA AATTAATATA ACACAAGGAG TGGTAAAACT TGGCAATGTT 
ACCAATGATG GTGATTTAAA CATTACCACT CACGCTAAAC GCAACCAAAG AAGCATCATC 
GGCGGAGATA TAATCAACAA AAAAGGAAGC TTAAATATTA CAGACAGTAA TAATGATGCT 
GAAATCCAAA TTGGOGGQRA TATCTCGCAA AAAGAAGGCA ACCTCACGAT TTCTTCCGAT 
AAAATTAATA TCACCAAACA GATAACAATC AAAAAGGGTA TTGATGGAGA GGACTCTAGT 
TCAGATGCGA CAAGTAATCC CAACCTAACT ATTAAAACCA AAGAATTGAA ATTGACAGAA 
GACCTAAGTA TTTCAGGTTT CAATAAAGCA GAGATTACAG CCAAAGATGG TAGAGATTTA 
ACTATTGGCA ACAGTAATGA CGGTAACAGC GGTGCCGAAG CCAAAACAGT AACTTTTAAC 
AATGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAATG TGACACTAAA TAGCAAAGTG 
AAAACATCTA GCAGCAATGG CGGACGTGAA AGCAATAGCG ACAACGATAC CGGCTTAACT 
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ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT CTCTCAAAAC AGTAAATATC 3780 

ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA TTAACGCAAC AAATGGCAAA 3840 

GCAAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA TTTCCGGTAA CACGGTAAGT 3900 

GTTAGCGCGA CTGGTGATTT AACCACTAAA TCCGGCTCAA AAATTGAAGC GAAATCGGGT 3960 

GAGGCTAATG TAACAAGTGC AACAGGTACA ATTGGOGGTA CAATTTCCGG TAATACGGTA 4020 

AATGTTACGG CAAACGCTGG CGATTTAACA GTTGGGAATG GCGCAGAAAT TAATGCGACA 4080 

GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTGA CTACTGAAGC OGGTTCTAGC 4140 

ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAGA ATGGTAGCAT CGCAGGAAGC 4200 

ATTAATGCTG CTAATGTGAC ATTAAATACT ACAGGCACCT TAACCACOGT GGCAGGCTCG 4260 

GATATTAAAG CAACCAGCGG CACCTTGGTT ATTAACGCAA AAGATGCTAA GCTAAATGGT 4320 

GATGCATCAG GTGATAGTAC AGAAGTGAAT GCAGTCAACG CAAGCGGCTC TGGTAGTGTG 43 BO 

ACTGCGGCAA CCTCAAGCAG TGTGAATATC ACTGGGGATT TAAACACAGT AAATGGGTTA 4440 

AATATCATTT CGAAAGATGG TAGAAAGACT GTGCGCTTAA GAGGCAAGGA AATTGAGGTG 4500 

AAATATATCC AGCCAGGTGT AGCAAGTGTA GAAGAAGTAA TTGAAGCGAA ACGCGTCCTT 4560 

GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAGAAACAT TAGCTAAACT TGGTGTAAGT 4620 

GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA ATACACAAAA TGAATTTACA 4680 

ACCAGACCGT CAAGTCAAGT GATAATTTCT GAAGGTAAGG CGTGTTTCTC AAGTGGTAAT 4740 

GGCGCACGAG TATGTACCAA TGTTGCTGAC GATGGACAGC CGTAGTCAGT AATTGACAAG 4800 

GTAGATTTCA TCCTGCAATG AAGTCATTTT ATTTXCGTAT TATTTACTGT GTGGGTTAAA 4860 

GTTCAGTACG GGCTTTACCC ATCTTGTAAA AAATTACGGA GAATACAATA AAGTATTTTT 4920 

AACAGGTTAT TATTATC 4937 
(2) INFORMATION FOR SEQ ID NO: 4: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1477 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY • linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
15 10 15 

Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp His Ser Thr Glu Lys 
20 25 30 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lys 
35 40 45 

Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser lie Pro Gin 
50 55 60 
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Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Gly Thr 
65 70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr lie lie Arg Asn Ser Val 
8S 90 95 

Asp Ala lie He Asn Trp Lys Gin Phe Asn He Asp Gin Asn Glu Met 
100 105 no 

Val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asn Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Gly 
130 135 140 

Gin Val Phe Leu He Asn Pro Asn Gly lie Thr He Gly Lys Asp Ala 
145 150 155 160 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lys 
180 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr lie Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 " 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Ser Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Phe Ala He Val Trp Gly Asp He Ala Leu He Asp 
38S 390 395 400 

Gly Asn He Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Gly 
405 410 415 
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Phe Val Glu Thr Ser Gly His Asp Leu Phe lie Lys Asp Asn Ala lie 
420 425 430 

Val Asp Ala Lys Glu Tip Leu Leu Asp Phe Asp Asn Val Ser lie Asn 
435 440 44S 

Ala Glu Asp Pro Leu Phe Asn Asn Thr Gly He Asn Asp Glu Phe Pro 
4S0 455 460 

Thr Gly Thr Gly Glu Ala Ser Asp Pro Lys Lys Asn Ser Glu Leu Lys 
465 470 475 480 

Thr Thr Leu Thr Asn Thr Thr He Ser Asn Tyr Leu Lys Asn Ala Trp 
485 490 495 



Thr Met Asn He Thr Ala Ser Arg Lys Leu Thr Val Asn Ser Ser He 
S00 505 510 

Asn He Gly Ser Asn Ser His Leu He Leu His Ser Lys Gly Gin Arg 
515 520 525 

Gly Gly Gly Val Gin He Asp Gly Asp He Thr Ser Lys Gly Gly Asn 
530 535 540 

Leu Thr. He Tyr ser Gly Gly Trp Val Asp Val His Lys Asn He Thr 
545 550 555 560 

Leu Asp Gin Gly Phe Leu Asn He Thr Ala Ala Ser Val Ala Phe Glu 
565 570 575 

Glv Gly Asn Asn Lys Ala Arg Asp Ala Ala Asn Ala Lys He Val Ala 
1 580 585 590 

Gin Gly Thr Val Thr He Thr Gly Glu Gly Lys Asp Phe Arg Ala Asn 
595 600 605 

Asn Val Ser Leu Asn Gly Thr . Gly Lys Gly Leu Asn He He Ser Ser 
610 615 "0 

Val Asn Asn Leu Thr His Asn Leu Ser Gly Thr He Asn He Ser Gly 
625 630 635 640 

Asn He Thr He Asn Gin Thr Thr Arg Lys Asn Thr Ser Tyr Trp Gin 
645 650 655 

Thr Ser His Asp Ser His Trp Asn Val Ser Ala Leu Asn Leu Glu Thr 
660 665 670 

Gly Ala Asn Phe Thr Phe He Lys Tyr He Ser Ser Asn Ser Lys Gly 
675 680 685 

Leu Thr Thr Gin Tyr Arg Ser Ser Ala Gly Val Asn Phe Asn Gly Val 
690 695 700 

Asn Glv Asn Met Ser Phe Asn Leu Lys Glu Gly Ala Lye Val Asn Phe 
705 710 715 720 

Lvs Leu Lys Pro Asn Glu Asn Met Asn Thr Ser Lys Pro Leu Pro He 
7 725 730 735 

Ara Phe Leu Ala Asn He Thr Ala Thr Gly Gly Gly Ser Val Phe Phe 
740 745 750 

Asp He Tyr Ala Asn His Ser Gly Arg Gly Ala Glu Leu Lys Met Ser 
755 760 765 
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Glu lie Asn lie Ser Asn Gly Ala Asn Phe Thr Leu Asn Ser His Val 
770 775 780 

Arg Gly Asp Asp Ala Phe Lys He Asn Lys Asp Leu Thr He Asn Ala 
785 790 795 800 

Thr Asn Ser Asn Phe Ser Leu Arg Gin Thr Lys Asp Asp Phe Tyr Asp 
805 810 815 

Gly Tyr Ala Arg Asn Ala He Asn Ser Thr Tyr Asn He Ser He Leu 
820 825 830 

Gly Gly Asn Val Thr Leu Gly Gly Gin Asn Ser Ser Ser Ser He Thr 
635 840 845 

Gly Asn lie "Thr He "Glu" Lys Ala Ala Asn Val" Thr Leu" Glu Ala Asn 
850 855 800 

Asn Ala Pro Asn Gin Gin Asn He Arg Asp Arg Val He Lys Leu Gly 
86S 870 875 680 

Ser Leu Leu Val Asn Gly Ser Leu Ser Leu Thr Gly Glu Asn Ala Asp 
885 890 895 

He Lys Gly Asn Leu Thr He Ser Glu Ser Ala Thr Phe Lys Gly Lys 
900 905 910 

Thr Arg Asp Thr Leu Asn He Thr Gly Asn Phe Thr Asn Asn Gly Thr 
915 920 925 

Ala Glu He Asn He Thr Gin Gly Val Val Lys Leu Gly Asn Val Thr 
930 935 940 

Asn Asp Gly Asp Leu Asn He Thr Thr His Ala Lys Arg Asn Gin Arg 
945 950 9S5 960 

Ser He He Gly Gly Asp He He Asn Lys Lys Gly Ser Leu Asn He 
965 970 975 

Thr Asp Ser Asn Asn Asp Ala Glu He Gin lie Gly Gly Asn He Ser 
980 985 990 

Gin Lys Glu Gly Asn Leu Thr He Ser Ser Asp Lys He Asn He Thr 
995 1000 1005 

Lys Gin He Thr He Lys Lys Gly He Asp Gly Glu Asp Ser Ser Ser 
1010 1015 1020 

Asp Ala Thr Ser Asn Ala Asn Leu Thr He Lys Thr Lys Glu Leu Lys 
1025 1030 1035 1040 

Leu Thr Glu Asp Leu Ser He Ser Gly Phe Asn Lys Ala Glu He Thr 
1045 1050 1055 

Ala Lys Asp Gly Arg Asp Leu Thr He Gly Asn Ser Asn Asp Gly Asn 
1060 . 1065 1070 

Ser Gly Ala Glu Ala Lys Thr Val Thr Phe Asn Asn Val Lys Asp Ser 
1075 1080 1085 

Lys He Ser Ala Asp Gly His Asn Val Thr Leu Asn Ser Lys Val Lys 
1090 1095 1100 

Thr Ser Ser Ser Asn Gly Gly Arg Glu Ser Asn Ser Asp Asn Asp Thr 
1105 1110 1115 1120 
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Gly Leu Thr He Thr Ala Lys Asn Val Glu Val Asn Lys Asp He Thr 
1125 H30 i 135 

Ser Leu Lys Thr Val Asn He Thr Ala Ser Glu Lys Vai Thr Thr Thr 
H40 H45 use 

Ala Gly Ser Thr He Asn Ala Thr Asn Gly Lys Ala Ser He Thr Thr 
H 5S 1160 lies 

Lys Thr Gly Asp He Ser Gly Thr He Ser Gly Asn Thr Val Ser Val 
1170 H75 1180 

Ser Ala Thr Val Asp Leu Thr Thr Lys Ser Gly Ser Lys He Glu Ala 
1185 _ 119 ° _ 1X55 i?o_o 

Lys Ser Gly Glu Ala Asn Val Thr Ser Ala Thr Gly Thr He Gly Gly 
1205 1210 i2is 

Thr He Ser Gly Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu 
1220 1225 1230 

Thr Val Gly Asn Gly Ala Glu He Asn Ala Thr Glu Gly Ala Ala Thr 
1235 1240 1245 

Leu Thr Ala Thr Gly Asn Thr Leu Thr Thr Glu Ala Gly Ser Ser He 
1250 1255 1260 

Thr Ser Thr Lys Gly Gin Val Asp Leu Leu Ala Gin Asn Gly Ser He 
1265 "70 1275 1280 

Ala Gly Ser He Asn Ala Ala Asn Val Thr Leu Asn Thr Thr Gly Thr 
1285 1290 1295 

Leu Thr Thr Val Ala Gly Ser Asp He Lys Ala Thr Ser Gly Thr Leu 
1300 1305 i3io 

Val He Asn Ala Lys Asp Ala Lys Leu Asn Gly Asp Ala Ser Gly Aso 
1315 1320 1325 

Ser Thr Glu Val Asn Ala Val Asn Ala Ser Gly Ser Gly Ser Val Thr 
1330 1335 1340 

Ala Ala Thr Ser Ser Ser Val Asn He Thr Gly Asp Leu Asn Thr Val 
1345 1350 1355 1350 

Asn Gly Leu Asn He He Ser Lys Asp Gly Arg Asn Thr Val Arg Le U 
1365 1370 1375 

Arg Gly Lys Glu lie Glu Val Lys Tyr He Gin Pre Gly Val Ala Ser 
1380 1385 1390 

Val Glu Glu Val He Glu Ala Lys Arg Val Leu Glu Lys Val Lys Asd 
1395 1400 1405 

Leu Ser Asp Glu Glu Arg Glu Thr Leu Ala Lys Leu Gly Val Ser Ala 
1*10 1415 1420 

Val Arg Phe Val Glu Pro Asn Asn Thr He Thr Val Asn Thr Gin Asn 
142 5 1430 1435 i 440 

Glu Phe Thr Thr Arg Pro Ser Ser Gin Val He He Ser Glu Glv Lvu 
1445 14S0 1455 

Ala Cys Phe Ser Ser Gly Asn Gly Ala Arg Val Cys Thr Asn Val Ala 
1460 1465 1470 



WO 97/36914 



PCT/US97/04707 



78 

Asp Asp Gly Gin Pro 
1475 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 9171 base pairs 

(B) TOTE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 5: 



ACAGCGTTCT 


CTTAATACTA 


GTACAAACCC 


ACAATAAAAT 


ATGACAAACA ACAATTACAA 


60 


caccittttt 


GCAGTCTATA 


TGCAAATATT 


TTAAAAAATA 


GTATAAATCC GCCATATAAA 


120 


ATGGTATAAT 


CTTTCATCTT 


TCATCTTTCA 


TCTTTCATCT 


TTCATCTTTC ATCTTTCATC 


180 


TTTCATCTTT 


CATCTTTCAT 


CTTTCATCTT 


TCATCTTTCA 


TCTTTCATCT TTCATCTTTC 


240 


ACATGAAATG 


ATGAACCGAG 


GGAAGGGAGG 


GAGGGGCAAG 


AATGAAGAGG GAGCTGAACG 


300 


AACGCAAATG 


ATAAAGTAAT 


TTAATTGTTC 


AACTAACCTT 


AGGAGAAAAT ATGAACAAGA 


260 


TATATCGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT GAATTGGCAC 


420 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GCGAAAAACC 


TGCTCGCATG AAAGTGCGTC 


480 


ACTTAGCGTT 


AAAGCCACTT 


TCCGCTATGT 


TACTATCTTT 


AGGTGTAACA TCTATTCCAC 


540 


AATCTGTTTT 


AGCAAGCGGC 


TTACAAGGAA 


TGGATGTAGT 


ACACGGCACA GCCACTATGC 


600 


AAGTAGATGG 


TAATAAAACC 


ATTATCCGCA 


ACAGTGTTGA 


CGCTATCATT AATTGGAAAC 


660 


AATTTAACAT 


CGACCAAAAT 


GAAATGGTGC 


AGTTTTTACA 


AGAAAACAAC AACTCCGCCG 


720 


TATTCAACCG 


TGTTACATCT 


AACCAAATCT 


CCCAATTAAA 


AGGGATTTTA GATTCTAACG 


780 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA ATTATTAACA 


B40 


CTAATGGCTT 


TACGGCTTCT 


ACGCTAGACA 


TTTCTAACGA 


AAACATCAAG GCGCGTAATT 


900 


TCACCTTCGA 


GCAAACCAAA 


GATAAAGCGC 


TCGCTGAAAT 


TGTGAATCAC GGTTTAATTA 


960 


CTGTCGGTAA 


AGACGGCAGT 


GTAAATCTTA 


TTGGTGGCAA 


AGTGAAAAAC GAGGGTGTGA 


1020 


TTAGCGTAAA 


TGGTGGCAGC 


ATTTCTTTAC 


TCGCAGGGCA 


AAAAATCACC ATCAGCGATA 


1080 


TAATAAACCC 


AACCATTACT 


TACAGCATTG 


CCGOGCCTGA 


AAATGAAGCG GTCAATCTGG 


1140 


GCGATATTTT 


TGCCAAAGGC 


GGTAACATTA 


ATGTCOGTGC 


TGCCACTATT CGAAACCAAG 


1200 


CTTTCCGCCA 


AAGAGGGTGA 


AGCGGAAATT 


GGCGGTGTAA 


TTTCOGCTCA AAATCAGCAA 


1260 


GCTAAAGGCG 


GCAAGCTGAT 


GATTACAGGC 


GATAAAGTCA 


CATTAAAAAC AGGTGCAGTT 


1320 


ATCGACCTTT 


CAGGTAAAGA 


AGGGGGAGAA 


ACTTACCTTG 


GCGGTGACGA GCGCGGCGAA 


1380 


GGTAAAAACG 


GCATTCAATT 


AGCAAAGAAA 


ACCTCTTTAG 


AAAAAGGCTC AACCATCAAT 


1440 


GTATCAGGCA 


AAGAAAAAGG 


CGGACGCGCT 


ATTGTGTGGG 


GCGATATTGC GTTAATTGAC 


1500 
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GGCAATATTA ACGCTCAAGG TAGTGGTGAT ATCGCTAAAA CCGGTGGTTT TGTGGAGACG 
TCGGGGCATG ATTTATTCAT CAAAGACAAT GCAATTGTTG ACGCCAAAGA GTGGTTGTTA 
GACCCGGATA ATGTATCTAT TAATGCAGAA ACAGCAGGAC GCAGCAATAC TTCAGAAGAC 
GATGAATACA CGGGATCCGG GAATAGTGCC AGCACCCCAA AACGAAACAA AGAAAAGACA 
ACATTAACAA ACACAACTCT TGAGAGTATA CTAAAAAAAG GTACCTTTCT TAACATCACT 
GCTAATCAAC GCATCTATGT CAATAGCTCC ATTAATTTAT CCAATGGCAG CTTAACTCTT 
TGGAGTGAGG GTCGGAGCGG TGGCGGCGTT GAGATTAACA ACGATATTAC CACCGGTGAT 
GATACCAGAG GTGCAAACTT AACAATTTAC TCAGGCGGCT GGGTTGATOT TCATAAAAAT 
ATCTCACTCG GGGCGCAAGG TAACATAAAC ATTACAGCTA AACAAGATAT CGCCTTTGAG 
AAAGGAAGCA ACCAAGTCAT TACAGGTCAA GGGACTATTA CCTCAGGCAA TCAAAAAGGT 
TTTAGATTTA ATAATGTCTC TCTAAACGGC ACTGGCAGCG GACTGCAATT CACCACTAAA 
AGAACCAATA AATACGCTAT CACAAATAAA TTTGAAGGGA CTTTAAATAT TTCAGGGAAA 
GTCAACATCT CAATGGTTTT ACCTAAAAAT GAAAGTGGAT ATGATAAATT CAAAGGACGC 
ACTTACTGGA ATTTAACCTC GAAAGTGGAT ATGATAAATT CAAAGGACGC CCTCACTATT 
GACTCCAGAG GAAGCGATAG TGCAGGCACA CTTACCCAGC CTTATAATTT AAACGGTATA 
TCATTCAACA AAGACACTAC CTTTAATGTT GAACGAAATG C^AGAGTCAA CTTTGACATC 
AAGGCACCAA TAGGGATAAA TAAGTATTCT AGTTTGAATT ACGCATCATT TAATGGAAAC 
ATTTCAGTTT CGGGAGQQQG GAGTGTTGAT TTCACACTTC TCGCCTCATC CTCTAACGTC 
CAAACCCCCG GTGTAGTTAT AAATTCTAAA TACTTTAATG TTTCAACAGG GTCAAGTTTA 
AGATTTAAAA CTTCAGGCTC AACAAAAACT GGCTTCTCAA TAGAGAAAGA TTTAACTTTA 
AATGCCACCG GAGGCAACAT AACACTTTTG CAAGTTGAAG GCACCGATGG AATGATTX3GT 
AAAGGCATTG TAGCCAAAAA AAACATAACC TTTGAAGGAG GTAAGATGAG GTTTGGCTCC 
AGGAAAGCCG TAACAGAAAT CGAAOGCAAT GTTACTATCA ATAACAACGC TAACGTCACT 
CTTATCGGTT CGGATTTTGA CAACCATCAA AAACCTTTAA CTATTAAAAA AGATGTCATC 
ATTAATAGCG GCAACCTTAC CGCTGGAGGC AATATTGTCA ATATAGCCGG AAATCTTACC 
GTTGAAAGTA ACGCTAATTT CAAAGCTATC ACAAATTTCA CTTTTAATGT AGGCGGCTTG 
TTTGACAACA AAGGCAATTC AAATATTTCC ATTGCCAAAG GAGGGGCTCG CTTTAAAGAC 
ATTGATAATT CCAAGAATTT AAGCATCACC ACCAACTCCA GCTCCACTTA CCGCACTATT 
ATAAGCGGCA ATATAACCAA TAAAAACGGT GATTTAAATA TTACGAACGA AGGTAGTGAT 
ACTGAAATGC AAATTGGCGG CGATOTCTCG CAAAAAGAAG GTAATCTCAC GATTTCTTCT 
GACAAAATCA ATATTACCAA ACAGATAACA ATCAAGGCAG GTGTTGATGG GGAGAATTCC 
GATTCAGACG CGACAAACAA TGCCAATCTA ACCATTAAAA CCAAAGAATT GAAATTAACG 
CAAGACCTAA ATATTTCAGG TTTCAATAAA GCAGAGATTA CAGCTAAAGA TGGTAGTGAT 
TTAACTATTG GTAACACCAA TAGTGCTGAT GGTACTAATG CCAAAAAAGT AACCTTTAAC 
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CAGGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAAGG TGACACTACA CAGCAAAGTG 3600 

GAAACATCCG GTAGTAATAA CAACACTGAA GATAGCAGTG ACAATAATGC CGGCTTAACT 3660 

ATCGATGCAA AAAATGTAAC AGTAAACAAC AATATTACTT CTCACAAAGC AGTGAGCATC 3720 

TCTGCGACAA GTGGAGAAAT TACCACTAAA ACAGGTACAA CCATTAACGC AACCACTGGT 3780 

AACOTGGAGA TAACCGCTCA AACAGGTAGT ATCCTAGGTG GAATTGAGTC CAGCTCTGGC 3840 

TCTGTAACAC TTACTOCAAC CGAGGGCGCT CTTGCTGTAA GCAATATTTC GGGCAACACC 3900 

GTTACTGTTA CTGCAAATAG CGGTGCATTA ACCACTTTGG CAGGCTCTAC AATTAAAGGA 3960 

ACCGAGAGTG TAACCACTTC AAGTCAATCA GGCGATATCG GCGGTACGAT TTCTGGTGGC 4020 

ACAGTAGAGG TTAAAGCAAC CGAAAGTTTA ACCACTCAAT CCAATTCAAA AATTAAAGCA 4060 

ACAACAGGCG AGGCTAACGT AACAAGTGCA ACAGGTACAA TTGGTGGTAC GATTTCCGGT 4X40 

AATACGGTAA ATGTTACGGC AAACGCTGOC GATTTAACAG TTGGGAATGG CGCAGAAATT 4200 

AATGCGACAG AAGGAGCTGC AACCTTAACT ACATCATCGG GCAAATTAAC TACCGAAGCT 4260 

AGTTCACACA TTACTTCAGC CAAGGGTCAG GTAAATCTTT CAGCTCAGGA TGGTAGCGTT 4320 

GCAGGAAGTA TTAATOCCGC CAATGTGACA CTAAATACTA CAGCCACTTT AACTACCGTG 4380 

AAGGGTTCAA ACATTAATGC AACCAGCGGT ACCTTGGTTA TTAACGCAAA AGACGCTGAG 4440 

CTAAATGGCG CAGCATTGGG TAACCACACA GTGGTAAATG CAACCAACGC AAATGGCTCC 4500 

GGCAGCGTAA TCGCGACAAC CTCAAGCAGA GTGAACATCA CTGGGGATTT AATCACAATA 4560 

AATGGATTAA ATATCATTTC AAAAAACGGT ATAAACACCG TACTGTTAAA AGGCGTTAAA 4620 

ATTGATGTQA AATACATTCA ACCGGGTATA GCAAGCGTAG ATGAAGTAAT TGAAGCGAAA 4680 

CGCATCCTTG AGAAGGTAAA AGATTTATCT GATGAAGAAA GAGAAGCGTT AGCTAAACTT 4740 

GGCGTAAGTG CTGTACGTTT TATTGAGCCA AATAATACAA TTACAGTCGA TACACAAAAT 4800 

GAATTTGCAA CCAGACCATT AAGTCGAATA GTGATTTCTG AAGGCAGGGC GTGTTTCTCA 4860 

AACAGTGATG GCGCGAOOGT GTGCGTTAAT ATCGCTGATA AOGGGCGGTA GCG6TCAOTA 4920 

ATTGACAAGG TAGATTTCAT CCTGCAATGA AGTCATTTTA TTTTCGTATT ATTTACTGTG 4980 

TGGGTTAAAG TTCAOTACGG GCTTTACCCA TCTTGTAAAA AATTACGGAG AATACAATAA 5040 

AGTATTTTTA ACAGGTTATT ATTATGAAAA ATATAAAAAG CAGATTAAAA CTCAGTGCAA 5100 

TATCAGTATT GCTTGGCCTG GCTTCTTCAT CATTGTATGC AGAAGAAGCG TTTTTAGTAA 5160 

AAGOCTTTCA GTTATCTGGT GCACTTGAAA CTTTAAGTGA AGACGCCCAA CTGTCTGTAG 5220 

CAAAATCTTT ATCTAAATAC CAAGGCTCOC AAACTTTAAC AAACCTAAAA ACAGCACAGC 5280 

TTGAATTACA GGCTGTGCTA GATAAGATTG AOCCAAATAA GTTTOATGTG ATATTGCCAC 5340 

AACAAACCAT TACGGATGGC AATATTATGT TTGAGCTAGT CTCGAAATCA GCOGCAGAAA 5400 

GCCAAGTTTT TTATAAGOCO AGCCAGGGTT ATAGTGAAGA AAATATCGCT CGTAGCCTGC 5460 

CATCTTTGAA ACAAGGAAAA GTGTATGAAG ATGGTCGTCA GTGGTTCGAT TTGCGTGAAT 5520 

TCAATATGGC AAAAGAAAAT CCACTTAAAG TCACTCGCGT GCATTACGAG TTAAACCCTA 5580 
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AAAACAAAAC 


CTCTGATTTG 


GTAGTTGCAG 


GTTTTTCGCC 


TTTTGGCAAA 


ACGCGTAGCT 


5640 


TTGTTTCCTA 


TGATAATTTC 


GGCGCAAGGG 


AGTTTAACTA 


TCAACGTGTA 


AGTCTAGGTT 


S700 


TTGTAAATGC 


CAATTTGACC 


GGACATGATG 


ATGTATTAAA 


TCTAAACGCA 


TTGACCAATG 


5760 


TAAAAGCACC 


ATCAAAATCT 


TATGCGGTAG 


GCATAGGATA 


TACTTATCCG 


TTTTATGATA 


5820 


AACACCAATC 


CTTAAGTCTT 


TATACCAGCA 


TGAGTTATGC 


TGATTCTAAT 


GATATCGACG 


58B0 


GCTTACCAA6 


TGCGATTAAT 


CGTAAATTAT 


CAAAAGGTCA 


ATCTATCTCT 


GCGAATCTGA 


5940 


AATGGAGTTA 


TTATCTCCCG 


ACATTTAACC 


TTGGAATGGA 


AGACCAGTTT 


AAAATTAATT 


€000 


TAGGCTACAA 


CTACCGCCAT 


ATTAATCAAA 


CATCCGAGTT 


AAACACCCTG 


GGTGCAACGA 


6060 


AGAAAAAATT 


TGCAGTATCA 


GGCGTAAGTG 


CAGGCATTGA 


TGGACATATC 


CAATTTACCC 


6120 


CTAAAACAAT 


CTTTAATATT 


GATTTAACTC 


ATCATTATTA 


CGCGAGTAAA 


TTACCAGGCT 


6180 


CTTTTGGAAT 


GGAGCGCATT 


GGCGAAACAT 


TTAATCGCAG 


CTATCACATT 


AGCACAGCCA 


6240 


GTTTAGGGTT 


GAGTCAAGAG 


TTTGCTCAAG 


GTTGGCATTT 


TAGCAGTCAA 


TTATCGGGTC 


6300 


AGTTTACTCT 


ACAAGATATA 


AGTAGCAT AG 


ATTTATTCTC 


TGTAACAGGT 


ACTTATGGCG 


6360 


TCAGAGGCTT 


TAAATACGGC 


GGTGCAAGTG 


GTGAGCGOGG 


TCTTGTATGG 


CGTAATGAAT 


6420 


TAAGTATGCC 


AAAATACACC 


CGCTTTCAAA 


TCAGCCCTTA 


TGCGTTTTAT 


GATGCAGGTC 


6480 


AGTTCCGTTA 


TAATAGCGAA 


AATGCTAAAA 


CTTACGGCGA 


AGATATGCAC 


ACGGTATCCT 


6540 


CTQCOGGTTT 


AGQCATTAAA 


ACCTCTCCTA 


CACAAAACTT 


AAGCTTAGAT 


GCTTTTGTTG 


6600 


CTCGTCGCTT 


TGCAAATGCC 


AATAGTGACA 


ATTTGAATGG 


CAACAAAAAA 


CGCACAAGCT 


6660 


CACCTACAAC 


CTTCTGGGGT 


AGATTAACAT 


TCAGTTTCTA 


ACCCTGAAAT 


TTAATCAACT 


6720 


GGTAAGCGTT 


CCGCCTACCA 


GTTTATAACT 


ATATGCTTTA 


CCCGCCAATT 


TACAGTCTAT 


6780 


ACGCAACCCT 


GTTTTCATCC 


TTATATATCA 


AACAAACTAA 


GCAAACCAAG 


CAAACCAAGC 


6840 


AAACCAAGCA 


AACCAAGCAA 


ACCAAGCAAA 


CCAAGCAAAC 


CAAGCAAACC 


AAGCAAACCA 


€900 


AGCAAACCAA 


GCAAACCAAG 


CAAACCAAGC 


AAACCAAGCA 


ATGCTAAAAA 


ACAATTTATA 


€960 


TGATAAACTA 


AAACATACTC 


CATACCATGG 


CAATACAAGG 


GATTTAATAA 


TATGACAAAA 


7020 


GAAAATTTAC 


AAAGTGTTCC 


ACAAAATACG 


ACCGCTTCAC 


TTGTAGAATC 


AAACAACGAC 


7080 


CAAACTTCCC 


TGCAAATACT 


TAAACAACCA 


CCCAAACCCA 


ACCTATTACG 


CCTGGAACAA 


7140 


CATGTCGCCA 


AAAAAGATTA 


TGAGCTTGCT 


TGCCGCGAAT 


TAATGGCGAT 


TTTGGAAAAA 


7200 


ATGGACGCTA 


ATTTTGGAGG 


CGTTCACGAT 


ATTGAATTTG 


ACGCACCTGC 


TCAGCTGGCA 


7260 






J\JfXX iWii AAA 








7320 


CTCTTTTCCG 


ACCCCGAATT 


GGCAATTTCC 


GAAGAAGGGG 


CATTAAAGAT 


GATTAGCCTG 


7380 


CAACGCTGGT 


TGACGCTGAT 


TTTTGCCTCT 


TCCCCCTACG 


TTAACGCAGA 


CCATATTCTC 


7440 


AATAAATATA 


ATATCAACCC 


AGATTCCGAA 


GGTGGCTTTC 


ATTTAGCAAC 


AGACAACTCT 


7500 


TCTATTGCTA 


AATTCTGTAT 


TTTTTACTTA 


CCOGAATCCA 


ATGTCAATAT 


GAGTTTAGAT 


7560 


GCGTTATGGG 


CAGGGAATCA 


ACAACTTTGT 


GCTTCATTGT 


GTTTTGCGTT 


GCAGTCTTCA 


7620 
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CGTTTTATTG GTACTGCATC TGCGTTTCAT AAAAGAGCGG TGGTTTTACA GTGGTTTCCT 7680 

AAAAAACTCG CCGAAATTGC TAATTTAGAT GAATTGCCTG CAAATATCCT TCATGATGTA 7740 

TATATGCACT GCAGTTATGA TTTAGCAAAA AACAAGCACG ATGTTAAGCG TCCATTAAAC 7800 

GAACTTGTCC GCAAGCATAT CCTCAOGCAA GGATGGCAAG ACCGCTACCT TTACACCTTA 7860 

GGTAAAAAGG ACGGCAAACC TGTGATGATG GTACTGCTTG AACATTTTAA TTCGGGACAT 7920 

TCGATTTATC GCACGCATTC AACTTCAATG ATTGCTGCTC GAGAAAAATT CTATTTAGTC 7980 

GGCTTAGGCC ATGAGGGCGT TGATAACATA GGTCGAGAAG TGTTTGACGA GTTCTTTGAA 8040 

ATCAGTAGCA ATAATATAAT GGAGAGACTG TTTTTTATCC- GTAAACAGTG CGAAACTTTC _ _ 8100 ^ 

CAACCCGCAG TGTTCTATAT GCCAAGCATT GGCATGGATA TTAGCACGAT TTTTGTGAGC B160 

AACACTCGGC TTGCCCCTAT TCAAGCTGTA GCCTTGGGTC ATOCTOCCAC TACGCATTCT 8220 

GAATTTATTG ATTATGTCAT CGTAGAAGAT GATTATGTGG GCAGTGAAGA TTGTTTTAGC 8280 

GAAACCCTTT TACGCTTACC CAAAGATGCC CTACCTTATG TACCATCTGC ACTCGCCCCA 8340 

CAAAAAGTGG ATTATGTACT CAGGGAAAAC CCTGAAGTAG TCAATATCGG TATTGCCGCT 8400 

ACCACAATGA AATTAAACCC TGAATTTTTG CTAACATTGC AAGAAATCAG AGATAAAGCT 8460 

AAAGTCAAAA TACATTTTCA TTTCGCACTT GGACAATCAA CAGGCTTGAC ACACCCTTAT 8520 

GTCAAATGGT TTATCGAAAG CTATTTAGGT GACGATGCCA CTGCACATCC CCAOGGACCT 8580 

TATCACGATT ATCTGGCAAT ATTGCGTGAT TGCGATATGC TACTAAATCC GTTTCCTTTC 8640 

GGTAATACTA ACGGCATAAT TGATATGGTT ACATTAGGTT TAGTTGGTGT ATGCAAAACG 8700 

GGGGATGAAG TACATGAACA TATTGATGAA GGTCTGTTTA AACGCTTAGG ACTACCAGAA 8760 

TGGCTGATAG CCGACACACG AGAAACATAT ATTGAATGTG CTTTGCGTCT AGCAGAAAAC 8820 

CATCAAGAAC GCCTTGAACT CCGTCGTTAC ATCATAGAAA ACAACGGCTT ACAAAAGCTT 8880 

TTTACAGGCG ACCCTCGTCC ATTGGGCAAA ATACTGCTTA AGAAAACAAA TGAATGGAAG 8940 

CGGAAGCACT TGAGTAAAAA ATAACGGTTT TTTAAAGTAA AAGTGCGGTT AATTTTCAAA 9000 

GCGTTTTAAA AACCTCTCAA AAATCAACCG CACTTTTATC TTTATAACGC TCCCGOGCGC 9060 

TOACAGTTTA TCTCTTTCTT AAAATACCCA TAAAATTGTG GCAATAGTTG GGTAATCAAA 9120 

TTCAATTGTT GATACGGCAA ACTAAAGACG GOGOGTTCTT CGGCAGTCAT C 9171 
(2) INFORMATION FOR SEQ ID NO: 6: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9323 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : fi ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGCCACTTCA ATTTTGGATT GTTGAAATTC AACTAACCAA AAAGTGCGGT TAAAATCTGT 60 
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GGAGAAAATA GGTTGTAGTG AAGAACGAGG 


TAA'X~X\s'X~lwt AAAGGATAAA GCTCTTCTTAA 


120 


TTGGGCATTG 


GTTGGCGTTT 


CTTTTTCGGT 


TAATAGTAAA TTATATTCTG GACGACTATG 


180 


CAATCCACCA ACAACTTTAC 


CGTTGGTTTT 


AAGCGTTAAT GTAAGTTCTT GCTCTTCTTG 


240 


GCGAATACGT AATCCCATTT 


TTTGTTTAGC 


AAGAAAATGA TCGGGATAAT CATAATAOGT 


300 


GTTGCCCAAA AATAAATTTT 


GATGTTCTAA 


AATCATAAAT 111 VjWiAijA 1 AI l\iTtKi\JAA 


^ f a 
360 


TTCAATACCT ATTTGTGGCG 


AAATCGCCAA 


TTTTAATTCA ATTTCTTGTA G CAT AAT ATT 


420 


TCCCACTCAA ATCAACTGGT TAAATATACA 


ACjATAATAAA AATAAATV-AA UAlTl X~X\jX\j 


480 


ATGACAAACA 


ACAATTACAA 


„CACCTTTTTT 


GGAGTvTATA _ TbUiAAl AX 1 _ 1 i AAAAAAAT 




AGTATAAATC 


CGCCATATAA 


AATGGTATAA 


TCTTTCATCT TTCATCTTTC ATCTTTCATC 


600 


TTTCATCTTT 


CATCTTTCAT 


CTTTCATCTT 


TCATCTTTCA TCTTTCATCT TTCATCTTTC 


660 


ATCTTTCATC 


TTTCATCTTT 


CACATGAAAT 


GATGAACCGA. GGGAAGGGAJG uGAQGGGCRA 


720 


GAATGAAGAG 


GGAGCTGAAC 


GAACGCAAAT 


GATAAAGTAA TTTAATTGTT CAACTAACCT 


780 


TAGGAGAAAA 


TATGAACAAG 


ATATATCGTC 


TCAAATTCAG CAAACGCCTG AATGCTTTGG 


840 


TTGCTGTGTC 


TGAATTGGCA 


CGGGGTTGTG 


ACCATTCCAC AGAAAAAGGC AGCGAAAAAC 


900 


CTGCTCGCAT 


GAAAGTGCGT 


CACTTAGCGT 


TAAAGCCACT TTCCGCTATG TTACTATCTT 


960 


TAGGTGTAAC 


ATCTATTCCA 


CAATCTGTTT 


TAGCAAGCGG CAATTTAACA TCGACCAAAA 


1020 


TGAAATGGTG 


CAGTTTTTAC 


AAGAAAACAA 


GTAATAAAAC CATTATCOGC AACAGTGTTG 


1080 


ACGCTATCAT 


TAATTGGAAA 


CAATTTAACA 


TCGACCAAAA TGAAATGGTG CAGTTTTTAC 


1140 


AAGAAAACAA 


CAACTCCGCC 


GTATTCAACC 


GTGTTACATC TAACCAAATC TCCCAATTAA 


1200 


AAGGGATTTT 


AGATTCTAAC 


GGACAAGTCT 


TTTTAATCAA CCCAAATGGT ATCACAATAG 


1260 


GTAAAGACGC 


AATTATTAAC 


ACTAATGGCT 


TTACGGCTTC TACGCTAGAC ATTTCTAACG 


1320 


AAAALJVxUAA 


wiUlUjliUil 


X 1 wlLU X X\A> 


AGCAAACCAA AuAXAAAQwG CTCGCTGAAA 


1380 


TTGTGAATCA 


CGGTTTAATT 


ACTGTCGGTA 


AAGAJUuviwUJ TtiTAAATVl 1 Al lWiuQCA 


1 A. Jt ft 

X44U 


AAGTGAAAAA 


CGAGGGTGTG 


ATTAGCGTAA 


»*iv3*TW2r!r»iin f^BTTM^PTTTA r*iwsoa/2/2ftr' 

AX\»ViTu>iwui WVX 1 X\»X X X A w XXTUuAuiiuv- 


15QU 


AAAAAATCAC 


CATCAGCGAT 


ATAATAAACC 


r*Mifv , ik f i ,, i , af , » f w rivr , ivn/**ji , p r i* ftr*f>rafY^f*v^iY2 


laoU 


AAAATGAAGC 


GGTCAATCTG 


GGCGATATTT 


X luwUiAAutf wVjvFX iviUi 1 X AAXtfX v«lA*X\a 


1 Oft 


CTGCCACTAT 


TCGAAACCAA 


GGTAAACTTT 


WX\£Wx wvL XV IvliUttiwuwi ViAlAAAAUtAj 


i con 


GCAATATTGT 


TCTTTCCGCC 


AAAGAGGGTG 


1\rW!/3HRHT «WVywSffHr» afT-PfWyTfY* 
A/wUiuAAAl X wVjV»V*\£ X vj X A All XVVAiV* X\- 


X- /4U 


AAAATCAGCA 


AGCTAAAGGC 


GGCAAGCTGA 


TGATAAAGTC CGATAAAGTC ACATTAAAAA 


1800 


CAGGTGCAGT 


TATCGACCTT 


TCAGGTAAAG 


AAOGGGGAGA AACTTACCTT GGCGGTQACG 


1860 


AGCGOGGCGA 


AGGTAAAAAC 


GGCATTCAAT 


TAGCAAAGAA AACCTCTTTA GAAAAAGOCT 


1920 


CAACCATCAA 


TGTATCAGGC 


AAAGAAAAAG 


GCGGACGCGC TATTGTGTGG GGCGATATTG 


1980 


CX5TTAATTGA 


CGGCAATATT 


AACGCTCAAG 


GTAGTGGTGA TATCGCTAAA ACCGGTGGTT 


2040 


TTGTGGAGAC 


ATCGGGGCAT 


TATTTATCCA 


TTGACAGCAA TGCAATTGTT AAAACAAAAG 


2100 
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AGTGGTTGCT AGACCCTGAT GATGTAACAA TTGAAGCCGA AGACCCCCTT CGCAATAATA 2160 

CCGGTATAAA TGATGAATTC CCAACAGGCA CCGGTGAAGC AAGCGACCCT AAAAAAAATA 2220 

GCGAACTCAA AACAAOGCTA ACCAATACAA CTATTTCAAA TTATCTGAAA AACGCCTGGA 2280 

CAATGAATAT AACGGCATCA AGAAAACTTA CCGTTAATAG CTCAATCAAC ATCGGAAGCA 2340 

ACTCCCACTT AATTCTCCAT AGTAAAGGTC AGCGTGGCGG AGGOGTTCAG ATTGATGGAG 2400 

ATATTACTTC TAAAGGOGGA AATTTAACCA TTTATTCTGG CGGATGGGTT GATGTTCATA 2460 

AAAATATTAC GCTTGATCAG GGTTTTTTAA ATATTACCGC CGCTTCCGTA GCTTTTGAAG 2S20 

GTCGAAATAA CAAAGCACGC GACGCGGCAA ATGCTAAAAT TGTCGCCCAG GGCACTGTAA 2580 

CCATTACAGG AGAGGGAAAA GATTTCAGGG CTAACAACGT ATCTTTAAAC GGAACGGGTA 2640 

AAGGTCTCAA TATCATTTCA TCAGTGAATA ATTTAACCCA CAATCTTAGT GGCACAATTA 2700 

ACATATCTCG GAATATAACA ATTAACCAAA CTACGAGAAA GAACACCTCG TATTGGCAAA 2760 

CCAGCCATGA TTCGCACTGG AACGTCAGTG CTCTTAATCT AGAGACAGGC GCAAATTTTA 2820 

CCTTTATTAA ATACATTTCA AGCAATAGCA AAGGCTTAAC AACACAGTAT AGAAGCTCTG 2880 

CAOGGGTOAA TTTTAACGGC GTAAATGGCA ACATGTCATT CAATCTCAAA GAAGGAGOGA 2940 

AAGTTAATTT CAAATTAAAA CCAAACGAGA ACATGAACAC AAGCAAACCT TTACCAATTC 3000 

GGTTTTTAGC CAATATCACA GCCACTGGTG GGGGCTCTGT TTTTTTTGAT ATATATGCCA 3060 

ACCATTCTGG CAGAGGGGCT GAGTTAAAAA TGAGTGAAAT TAATATCTCT AACGGOGCTA 3120 

ATTTTACCTT AAATTCCCAT GTTCGCGGOG ATCACGCTTT TAAAATCAAC AAAGACTTAA 3180 

CCATAAATGC AACCAATTCA AATTTCAGCC TCAGACAGAC GAAAGATGAT TTTTATGACG 3240 

GGTACGCACG CAATCCCATC AATTCAACCT ACAACATATC CATTCTGGGC GGTAATCTCA 3300 

CCCTTGGTGG ACAAAACTCA AGCAGCAGCA TTACGGGGAA TATTACTATC GAGAAAGCAG 3360 

CAAATGTTAC GCTAGAAGCC AATAACGCCC CTAATCAGCA AAACATAAGG GATAGAGTTA 3420 

TAAAACTTGG CAGCTTGCTC GTTAATGGGA GTTTAAGTTT AACT3GCGAA AATCCAGATA 3480 

TTAAAGGCAA TCTCACTATT TCAGAAAGCG CCACTTTTAA AGGAAAGACT AGAGATACCC 3540 

TAAATATCAC CGGCAATTTT ACCAATAATG GCACTGCCGA AATTAATATA ACACAAGGAG 3600 

TGGTAAAACT TGGCAATGTT ACCAATGATG GTGATTTAAA CATTACCACT CACGCTAAAC 3660 

GCAACCAAAG AAGCATCATC GGCGGAGATA TAATCAACAA AAAAGGAAGC TTAAATATTA 3720 

CAGACAGTAA TAATGATGCT GAAATCCAAA TTGGCGGCAA TATCTCGCAA AAAGAAGGCA 3780 

ACCTCACGAT TTCTTCOGAT AAAATTAATA TCACCAAACA GATAACAATC AAAAAGGGTA 3840 

TTGATOGAGA GGACTCTAGT TCAGATGCGA CAAGTAATGC CAACCTAACT ATTAAAACCA 3900 

AAGAATTGAA ATTGACAGAA GACCTAAGTA TTTCAGGTTT CAATAAAGCA GAGATTACAG 3960 

CCAAAGATGG TAGAGATTTA ACTATTGGCA ACAGTAATGA CGGTAACAGC GGTGCCGAAG 4020 

CCAAAACAGT AACTTTTAAC AATGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAATG 4080 

TGACACTAAA TAGCAAAGTG AAAACATCTA GCAGCAATGG CGGAOGTGAA AGCAATAGCG 4140 
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ACAACGATAC CGGCTTAACT ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT 4200 

CTCTCAAAAC AGTAAATATC ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA 4260 

TTAACGCAAC AAATGGCAAA GCAAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA 4320 

TTTCCGGTAA CACGGTAAGT GTTAGCGCGA CTGGTGATTT AACCACTAAA TCCGGCTCAA 4380 

AAATTGAAGC GAAATCGGGT GAGGCTAATG TAACAAGTGC AACAGGTACA ATTGGCGGTA 4440 

CAATTTCCGG TAATACGGTA AATGTTACGG CAAACGCTGG CGATTTAACA GTTGGGAATG 4500 

GOGCAGAAAT TAATGCGACA GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTGA 4560 

CTACTGAAGC CGOTTCTAGC ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAC3A_ 4620 

ATGGTAGCAT CGCAGGAAGC ATTAATGCTG CTAATGTGAC ATTAAATACT ACAGGCACCT 4680 

TAACCACCGT GGCAGGCTCG GATATTAAAG CAACCAGCGG CACCTTGGTT ATTAACGCAA 4740 

AAGATGCTAA GCTAAATGGT GATGCATCAG GTCATAGTAC AGAAGTGAAT GCAGTCAACG 480 0 

ACTGGGGATT TGGTAGTGTG ACTGCGGCAA CCTCAAGCAG TGTGAATATC ACTGGGGATT 4860 

TAAACACAGT AAATGGGTTA AATATCATTT CGAAAGATGG TAGAAACACT GTGCGCTTAA 4920 

GAGGCAAGGA AATTGAGGTG AAATATATCC AGCCAGGTGT AGCAAGTGTA GAAGAAGTAA 4980 

TTGAAGCGAA ACGCGTCCTT GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAGAAACAT 5040 

TAGCTAAACT TGGTGTAAGT GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA 5100 

ATACACAAAA TCAATTTACA ACCAGACOGT CAAGTCAAGT GATAATTTCT GAAGGTAAGG 5X60 

CGTGTTTCTC AAGTGGTAAT GGCGCACGAG TATOTACCAA TGTTGCTGAC GATGGACAGC 5220 

CGTAGTCAGT AATTGACAAG GTAGATTTCA TCCTGCAATG AAGTCATTTT ATTTTCGTAT 5280 

TATTTACTGT GTGGGTTAAA GTTCAGTACG GGCTTTACCC ATCTTGTAAA AAATTACGGA 5340 

GAATACAATA AAGTATTTTT AACAGGTTAT TATTATGAAA AATATAAAAA GCAGATTAAA 5400 

ACTCAGTGCA ATATCAGTAT TGCTTGGCCT GGCTTCTTCA TCATTGTATG CAGAAGAAGC 5460 

GTTTTTAGTA AAAGGCTTTC AGTTATCTGG TGCACTTGAA ACTTTAAGTG AAGACGCCCA 5520 

ACTGTCTGTA GCAAAATCTT TATCTAAATA CCMGGCTCG CAAACTTTAA CAAACCTAAA 5580 

AACAGCACAG CTTGAATTAC AGGCTGTGCT AGATAAGATT GAGCCAAATA AATTTGATGT 5640 

GATATTGC CG CAACAAACCA TTACGGATGG CAATATCATG TTTGAGCTAG TCTCGAAATC 5700 

AGCCGCAGAA AGCCAAGTTT TTTATAAGGC QAGCCAGGGT TATAGTGAAG AAAATATCGC 5760 

TCGTAGCCTG CCATCTTTGA AACAAGGAAA AGTGTATGAA GATGGTCGTC AGTGGTTCGA 5820 

TTTGCGTGAA TTTAAT ATGG CAAAAGAAAA CCOGCTTAAG GTTACCCGTG TACATTACGA 5880 

ACTAAACCCT AAAAACAAAA CCTCTAATTT GATAATTGOG OGCTTCTOGC CTTTTGGTAA 5940 

AACGCGTAGC TTTATTTCTT ATGATAATTT OGGCGOGAGA GAGTTTAACT ACCAACGTGT 6000 

AAGCTTGGGT TTTGTTAATG CCAATTTAAC TGGTCATGAT GATGTGTTAA TTATACCAGT 6060 

ATGAGTTATG CTGATTCTAA TGATATOGAC GGCTTACCAA GTGCGATTAA TCGTAAATTA 6120 

TCAAAAGGTC AATCTATCTC TGCGAATCTG AAATGGAGTT ATTATCTCCC AACATTTAAC 6180 
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CTTGGCATGG AAGACCAATT TAAAATTAAT TTAGGCTACA ACTACCGCCA TATTAATCAA 6240 

ACCTCCGCGT TAAATCGCTT GGGTGAAACG AAGAAAAAAT TTGCAGTATC AGGCGTAAGT 6300 

GCAGGCATTG ATGGACATAT CCAATTTACC CCTAAAACAA TCTTTAATAT TGATTTAACT 6360 

CATCATTATT ACGCGAGTAA ATTACCAGGC TCTTTTGGAA TGGAGCGCAT TGGCGAAACA 6420 

TTTAATCGCA GCTATCACAT TAGCACAGCC AGTTTAGGGT TGAGTCAAGA GTTTGCTCAA 6480 

GGTTGGCATT TTAGCAGTCA ATTATCAGGT CAATTTACTC TACAAGATAT TAGCAGTATA 6540 

GATTTATTCT CTGTAACAGG TACTTATGGC GTCAGAGGCT TTAAATACGG CGGTGCAAGT 6600 

- GGTGAGCGCG GTCTTGTATG GCGTAATGAA TTAAGTATGC CAAAATACAC CCGCTTCCAA 6660 

ATCAGCCCTT ATGCGTTTTA TGATGCAGGT CAGTTCCGTT ATAATAGCGA AAATGCTAAA 6720 

ACTTACGGCG AAGATATGCA CACGGTATCC TCTGCGGGTT TAGGCATXAA AACCTCTCCT 6780 

ACACAAAACT TAAGCCTAGA TGCTTTTGTT GCTCGTOGCT TTGCAAATGC CAATAGTGAC 6840 

AATTTGAATG GCAACAAAAA ACGCACAAGC TCACCTACAA CCTTCTGGGG GAGATTAACA 6900 

TTCAGTTTCT AACCCTGAAA TTTAATCAAC TGGTAAGOGT TGCGCCTACC AGTTTATAAC 6960 

TATATGCTTT ACCCGCCAAT TTACAGTCTA TAGGCAACCC TGTTTTTACC CTTATATATC 7020 

AAATAAACAA GCTAAGCTGA GCTAAGCAAA CCAAGCAAAC TCAAGCAAGC CAAGTAATAC 7080 

TAAAAAAACA ATTTATATGA TAAACTAAAG TATACTCCAT GCCATGGCGA TACAAGGGAT 7140 

TTAATAATAT GACAAAAGAA AATTTGCAAA ACGCTCCTCA AGATGCGACC GCTTTACTTG 7200 

CGGAATTAAG CAACAATCAA ACTCCCCTGC GAATATTTAA ACAACCACGC AAGCCCAGCC 7260 

TATTACGCTT GGAACAACAT ATCGCAAAAA AAGATTATGA GTTTGCTTGT CGTGAATTAA 7320 

TGGTGATTCT GGAAAAAATG GACGCTAATT TTGGAGGOGT TCACGATATT GAATTTGACG 7380 

CACCCGCTCA GCTGGCATAT CTACCCGAAA AATTACTAAT TTATTTTGCC ACTCGTCTCG 7440 

CTAATGCAAT TACAACACTC TTTTCCGACC CCGAATTGGC AATTTCTGAA GAAGGGGCGT 7500 

TAAAGATGAT TAGCCTGCAA CGCTGGTTGA CGCTGATTTT TGCCTCTTCC CCCTACGTTA 7560 

ACGCAGACCA TATTCTCAAT AAATATAATA TCAACCCAGA TTCCGAAGGT GGCTTTCATT 7620 

TAGCAACAGA OUICTCTTCT ATTGCTAAAT TCTGTATTTT TTACTTACCC GAATCCAATG 7680 

TCAATATGAG TTTAGATOCG TTATGGGCAG GGAATCAACA ACTTTGTGCT TCATTGTGTT 7740 

TTGCGTTGCA GTCTTCACGT TTTATTGGTA CCGCATCTCC GTTTCATAAA AOAGOGGTGG 7800 

TTTTACAGTG GTTTCCTAAA AAACTCGCCG AAATTGCTAA TTTAGATGAA TTGCCTGCAA 7860 

ATATCCTTCA TGATGTATAT ATGCACTGCA GTTATGATTT AGCAAAAAAC AAGCACGATG 7920 

TTAAGCGTCC ATTAAACGAA CTTGTCCGCA AGCATATCCT CACGCAAGGA TGGCAAGACC 7980 

GCTACCTTTA CACCTTAGGT AAAAAGGACG GCAAACCTGT GATGATGGTA CTGCTTGAAC 8040 

ATTTTAATTC GGGACATTCG ATTTATCGTA CACATTCAAC TTCAATGATT GCTGCTCGAG 8100 

AAAAATTCTA TTTAGTCGGC TTAGGCCATG AGGGCGTTGA TAAAATAGGT CGAGAAGTGT 8160 

TTGACGAGTT CTTTGAAATC AGTAGCAATA ATATAATGGA GAGACTGTTT TTTATCCGTA 8220 
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AACAGTGCGA AACTTTCCAA CCCGCAGTGT TCTATATGCC AAGCATTGGC ATGGATATTA 8280 

CCACGATTTT TGTGAGCAAC ACTCGGCTTG CCCCTATTCA AGCTGTAGCC CTCGGTCATC 8340 

CTGCCACTAC GCATTCTGAA TTTATTGATT ATGTCATCGT AGAAGATGAT TATGTGGGCA 8400 

GTGAAGATTG TTTCAGCOAA ACCCTTTTAC GCTTACCCAA AGATGCCCTA CCTTATOTAC 8460 

CTTCTGCACT CGCCCCACAA AAAGTGGATT ATGTACTCAG GGAAAACCCT GAAGTAGTCA 8520 

ATATCGGTAT TGCCGCTACC ACAATGAAAT TAAACCCTGA ATTTTTGCTA ACATTGCAAG 8580 

AAATCAGAGA TAAAGCTAAA GTCAAAATAC ATTTTCATTT CGCACTTGGA CAATCAACAG 8640 

GCTTGACACA CCCTTATGTC AAATGGTTTA TCGAAAGCTA TTTAGGTGAC GATGCCACTG 8700 

CACATCCCCA CGCACCTTAT CACGATTATC TGGCAATATT GCGTGATTGC GATATGCTAC 8760 

TAAATCCGTT TCCTTTCGGT AATACTAACG GCATAATTGA TATGGTTACA TTAGGTTTAG 8820 

TTCGTOTATG CAAAACGGGG GATGAAGTAC ATGAACATAT TGATGAAGGT CTGTTTAAAC 8880 

GCTTAGGACT ACCAGAATGG CTGATAGCCG ACACACGAGA AACATATATT GAATGTGCTT 8940 

TGCGTCTAGC AGAAAACCAT CAAGAACGCC TTGAACTCCG TCGTTACATC ATAGAAAACA 9000 

ACGGCTTACA AAAGCTTTTT ACAGGCGACC CTCGTCCATT GGGCAAAATA CTGCTTAAGA 9060 

AAACAAATGA ATGGAAGCGG AAGCACTTGA GTAAAAAATA ACGGTTTTTT AAAGTAAAAG 9120 

TGCGGTTAAT TTTCAAAGCG TTTTAAAAAC CTCTCAAAAA TCAACCGCAC TTTTATCTTT 9180 

ATAACGATCC CGCACGCTGA CAGTTTATCA GCCTCCCGCC ATAAAACTCC GCCTTTCATG 9240 

GCGGAGATTT TAGCCAAAAC TGGCAGAAAT TAAAGGCTAA AATCACCAAA TTGCACCACA 9300 

AAATCACCAA TACCCACAAA AAA 9323 



(2) INFORMATION FOR SEQ ID 110:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4794 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 7: 

ATGAACAAGA TATATCGTCT CAAATTCAGC AAACGCCTGA ATGCTTTGGT TGCTGTGTCT 60 

GAATTGACAC GGGGTTOTGA CCATTCCACA GAAAAAGGCA GTGAAAAACC TGTTCGTACG 120 

AAAGTACGCC ACTTGGCGTT AAAGCCACTT TCCGCTATAT TGCTATCTTT GGGCATGGCA 180 

TCCATTCCGC AATCTGTTTT AGCGAGOGGT TTACAGGGAA TGAGCGTCGT ACACGGTACA 240 

GCAACCATGC AAGTAGACGG CAATAAAACC ACTATCOGTA ATAGCGTCAA TGCTATCATC 300 

AATTGGAAAC AATTTAACAT TGACCAAAAT GAAATGGTGC AGTTTTTACA AGAAAGCAGC 360 

AACTCTGCCG TTTTCAACCG TGTTACATCT GACCAAATCT CCCAATTAAA AGGGATTTTA 420 
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GATTCTAACG GACAAGTCTT TTTAATCAAC CCAAATGGTA TCACAATAGG TAAAGACGCA 480 

ATTATTAACA CTAATGGCTT TACTGCTTCT ACQCTAGACA TTTCTAACGA AAACATCAAG 540 

GCGCGTAATT TCACCCTTGA GCAAACCAAG GATAAAGCAC TCGCTGAAAT CGTGAATCAC 600 

GGTTTAATTA CCGTTGGTAA AGACGGTAGC GTAAACCTTA TTGGTGGCAA AGTGAAAAAC 660 

GAGGGCGTGA TTAGCGTAAA TGGCGGTAGT ATTTCTTTAC TTGCAGGGCA AAAAATCACC 720 

ATCAGCGATA TAATAAATCC AACCATCACT TACAGCATTG CTGCACCTGA AAACGAAGCG 780 

RTCAATCTGG GCGATATTTT TGCCAAAGGT GGTAACATTA ATGTCCGCGC TGCCACTATT 840 

CGCAATAAAG GTAAACTTTC TCCCGACTCT _GTAAGCAAAG ATAAAAGTGG TAACATTGTT 900 

CTCTCTGCCA AAGAAGGTCA AGGGGAAATT GGCGGTGTAA TTTCCGCTCA AAATCAGCAA 960 

GCCAAAGGTG GTAAGTTGAT GATTACAGGC GATAAAGTTA CATTGAAAAC GGGTGCAGTT 1020 

ATCGACCTTT CGGGTAAAGA AGGGGGAGAA ACTTATCTTG GOGGTGACGA GCGTGGOGAA 1080 

GGTAAAAACG GCATTCAATT AGCAAAGAAA ACCACTTTAG AAAAAGGCTC AACAATTAAT 1140 

GTGTCAGGTA AAGAAAAAGG TGGGCOCGCT ATTGTATGGG GCOATATTGC GTTAATTGAC 1200 

GGCAATATTA ATGCCCAAGG TAAAGATATC GCTAAAACTG GTGGTTTTGT GGAGACGTCG 1260 

GGGCATTACT TATCCATTQA TGATAACGCA ATTGTTAAAA CAAAAGAATG GCTACTAGAC 1320 

CCAGAGAATG TGACTATTGA AGCTCCTTCC GCTTCTCGOG TCGAGCTGGG TGCCGATAGG 1380 

AATTCCCACT CGGCAGAGGT GATAAAAGTG ACCCTAAAAA AAAATAACJVC CTOCTTGACA 1440 

ACACTAACCA ATACAACCAT TTCAAATCTT CTGAAAAGTG CCCACGTOGT GAACATAAOG 1500 

GCAAGGAGAA AACTTACCGT TAATAGCTCT ATCAGTATAG AAAGAGGCTC CCACTTAATT 1560 

CTCCACAGTG AAGGTCAGGO COGTCAAGGT GTTCAGATTG ATAAAGATAT TACTTCTGAA 1620 

OGCGGAAATT TAACCATTTA TTCTGGCGGA TOGGTTGATG TTCATAAAAA TATTACGCTT 1680 

GGTAGOGGCT TTTTAAACAT CACAACTAAA GAAGGAGATA TCGCCTTCGA AGACAAGTCT 1740 

GGACGGAACA ACCTAACCAT TACAGCCCAA GGGACCATCA CCTCAGGTAA TAGTAACGGC 1800 

TTTAGATTTA ACAACGTCTC TCTAAACAGC CTCGGCGGAA AGCTGAGCTT TACTGACAGC 1860 

AGAGAGGACA GAGGTAGAAG AACTAAGGGT AATATCTCAA ACAAATTTGA CGGAACGTTA 1920 

AACATTTCCG GAACTGTAGA TATCTCAATG AAAGCACCCA AAGTCAGCTG GTTTTACAGA 1980 

GACAAAGGAC GCACCTACTG GAACGTAACC ACTTTAAATG TTACCTCGGG TAGTAAATTT 2040 

AACCTCTCCA TTGACAGCAC AGGAAGTGGC TCAACAGGTC CAAGCATACG CAATGCAGAA 2100 

TTAAATGGCA TAACATTTAA TAAAGCCACT TTTAATATCG CACAAGGCTC AACAGCTAAC 2160 

TTTAGCATCA AGGCATCAAT AATGCCCTTT AAGAGTAACG CTAACTACGC ATTATTTAAT 2220 

GAAGATATTT CAGTCTCAGG GGGGGGTAGC CTTAATTTCA AACTTAACGC CTCATCTAGC 2280 

AACATACAAA CCCCXGGOGT AATTATAAAA TCTCAAAACT TTAATGTCTC AGGAGGGTCA 2340 

ACTTTAAATC TCAAGGCTGA AGGTTCAACA GAAACCGCTT TTTCAATAGA AAATGATTTA 2400 

AACTTAAACG CCACCGGTGG CAATATAACA ATCAGACAAG TCGAGGGTAC CGATTCACGC 2460 
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GTCAACAAAG 


GTGTCGCAGC 


CAAAAAAAAC 


ATAACTTTTA 


AAGGGGGTAA 


TATCACCTTC 


2520 


GGCTCTCAAA 


AAGCCACAAC 


AGAAATCAAA 


GGCAATGTTA 


CCATCAATAA 


AAACACTAAC 


25S0 


GCTACTCTTT 


GTGGTGCGAA 


TTTTGCCGAA 


AACAAATCGC 


CTTTAAATAT 


AGCAGGAAAT 


2640 


GTTATTAATA 


ATGGCAACCT 


TACCACTGCC 


GGCTCCATTA 


TCAATATAGC 


CGGAAATCTT 


2700 


ACTGTTTCAA 


AAGGCGCTAA 


CCTTCAAGCT 


ATAACAAATT 


ACACTTTTAA 


TGTAGCCGGC 


2760 


TCATTTGACA 


ACAATGGCGC 


TTCAAACATT 


TCCATTGCCA 


GAGGAGGGGC 


TAAATTTAAA 


2820 


GATATCAATA 


ACACCAGTAG 


CTTAAATATT 


ACCACCAACT 


CTGATACCAC 


TTACCGCACC 


2880 


ATTATAAAAG 


GCAATATATC 


CAACAAATCA 


GGTGATTTGA 


ATATTATTGA 


TAAAAAAAGC 


2940 


GACGCTGAAA 


TCCAAATTGG 


CGGCAATATC 


TCACAAAAAG 


AAGGCAATCT 


CACAATTTCT 


3000 


TCTGATAAAG 


TAAATATTAC 


CAATCAGATA 


ACAATCAAAG 


CAGGCGTTGA 


AGGGGGGCGT 


3060 


TCTGATTCAA 


GTGAGGCAGA 


AAATGCTAAC 


CTAACTATTC 


AAACCAAAGA 


GTTAAAATTG 


3120 


GCAGGAGACC 


TAAATATTTC 


AGGCTTTAAT 


AAAGCAGAAA 


TTACAGCTAA 


AAATGGCAGT 


3180 


GATTTAACTA 


TTGGCAATGC 


TAGCGGTGGT 


AATGCTGATG 


CTAAAAAAGT 


GACTTTTGAC 


3240 


AAGGTTAAAG 


ATTCAAAAAT 


CTCGACTGAC 


GGTCACAATG 


TAACACTAAA 


TAGCGAAGTG 


3300 


AAAACGTCTA 


ATGGTAGTAG 


CAATGCTGGT 


AATGATAACA 


GCACCGGTTT 


AACCATTTCC 


3360 


GCAAAAGATG 


TAACGGTAAA 


CAATAACGTT 


ACCTCCCACA 


AGACAATAAA 


TATCTCTGCC 


3420 


GCAGCAGGAA 


ATGTAACAAC 


CAAAGAAGGC 


ACAACTATCA 


ATGCAACCAC 


AGGCAGCGTG 


3480 


GAAGTAACTG 


CTCAAAATGG 


TACAATTAAA 


GGCAACATTA 


CCTCGCAAAA 


TGTAACAGTG 


3540 


ACAGCAACAG 


AAAATCTTGT 


TACCACAGAG 


AATGCTCTCA 


TTAATGCAAC 


CAGCGGCACA 


3600 


GTAAACATTA 


GTACAAAAAC 


AGGGGATATT 


AAAGGTGGAA 


TTGAATCAAC 


TTCCGGTAAT 


3660 


GTAAATATTA 


CAGCGAGCGG 


CAATACACTT 


AAGGTAAGTA 


ATATCACTGG 


TCAAGATGTA 


3720 


ACAGTAACAG 


CGGATGCAGG 


AGCCTTGACA 


ACTACAGCAG 


GCTCAACCAT 


TAGTGCGACA 


3780 


ACAGGCAATG 


CAAATATTAC 


AACCAAAACA 


GGTGATATCA 


ACGGTAAAGT 


TGAATCCAGC 


3640 


TCCGGCTCTG 


TAACACTTGT 


TGCAACTGGA 


GCAACTCTTG 


CTGTAGGTAA 


TATTTCAGGT 


3900 


AACACTGTTA 


CTATTACTGC 


GGATAGCGGT 


AAATTAACCT 


CCACAGTAGG 


TTCTACAATT 


3960 


AATGGGACTA 


ATAGTGTAAC 


CACCTCAAGC 


CAATCAGGCG 


ATATTGAAGG 


TACAATTTCT 


4020 


GGTAATACAG 


TAAATGTTAC 


AGCAAGCACT 


GGTGATTTAA 


CTATTGGAAA 


TAGTGCAAAA 


4060 


GTTGAAGCGA 


AAAATGGAGC 


TGCAACCTTA 


ACTGCTGAAT 


CAGGCAAATT 


AACCACCCAA 


4140 


ACAuuvrru lit 




JtnUUUimUll 








4200 


ATCGCAGGAA 


ACATTAATGC 


TGCTAATGTG 


ACGTTAAATA 


CCACAGGCAC 


TTTAACTACT 


4260 


ACAGGGGATT 


CAAAGATTAA 


CGCAACCAGT 


GGTACCTTAA 


CAATCAATGC 


AAAAGATGCC 


4320 


AAATTAGATG 


GTGCTGCATC 


AGGTGACCGC 


ACAGTAGTAA 


ATGCAACTAA 


CGCAAGTCGC 


4380 


TCTGGTAACG 


TGACTGCGAA 


AACCTCAAGC 


AGCGTGAATA 


TCACCGGGGA 


TTTAAACACA 


4440 


ATAAATGGGT 


TAAATATCAT 


TTCGGAAAAT 


GGTAGAAACA 


CTGTGCGCTT 


AAGAGGCAAG 


4500 
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GAAATTGATG TGAAATATAT CCAACCAGGT GTAGCAAGCG TAGAAGAGGT AATTGAAGCG 4560 

AAACGCGTCC TTGAGAAGGT AAAAGATTTA TCTGATGAAG AAAGAGAAAC ACTAGCCAAA 4620 

CTTGGTGTAA GTGCTGTACG TTTCGTTGAG CCAAATAATG CCATTACGGT TAATACACAA 4680 

AACGAGTTTA CAACCAAACC ATCAAGTCAA GTGACAATTT CTGAAGGTAA GGCGTGTTTC 4740 

TCAAGTGGTA ATGGCGCACG AGTATGTACC AATGTTGCTG ACGATGGACA GCAG 4794 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4803 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



ATGAACAAGA 


TATATOGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT 


60 


GAATTGACAC 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GTGAAAAACC 


TGTTCGTACG 


120 


AAAGTACGCC 


ACTTGGCGTT 


AAAGCCACTT 


TCCGCTATAT 


TGCTATCTTT 


GGGCATGGCA 


180 


TCCATTCCGC 


AATCTGTTTT 


AGCGAGCGGT 


TTACAGGGAA 


TGAGCGTCGT 


ACACGGTACA 


240 


GCAACCATGC 


AAGTAGACGG 


CAATAAAACC 


ACTATCCGTA 


ATAGCGTCAA 


TCCTATCATC 


300 


AATTGGAAAC 


AATTTAACAT 


TGACCAAAAT 


GAAATGGTGC 


AGTTTTTACA 


AGAAAGCAGC 


360 


AACTCTGCCG 


TTTTCAACCG 


TGTTACATCT 


GACCAAATCT 


CCCAATTAAA 


AGGGATTTTA 


420 


GATTCTAACG 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA 


480 


ATTATTAACA 


CTAATGGCTT 


TACTGCTTCT 


ACGCTAGACA 


TTTCTAAOGA 


AAACATCAAG 


S40 


GOGCGTAATT 


TCACCCTTGA 


GCAAACCAAG 


GATAAAGCAC 


TCGCTGAAAT 


CGTGAATCAC 


600 


GGTTTAATTA 


CCGTTGGTAA 


AGACGGTAGC 


GTAAACCTTA 


TTGGTGGCAA 


AGTGAAAAAC 


660 


GAGGGCGTGA 


TTAGCGTAAA 


TGGCGGTAGT 


ATTTCTTTAC 


TTGCAGGGCA 


AAAAATCACC 


720 


ATCAGCGATA 


TAATAAATCC 


AACCATCACT 


TACAGCATTG 


CTGCACCTGA 


AAACGAAGCG 


7B0 


ATCAATCTGG 


GCGATATTTT 


TGCCAAAGGT 


GGTAACATTA 


ATGTCOGCGC 


TGCCACTATT 


840 


CGCAATAAAG 


GTAAACTTTC 


TGCCGACTCT 


GTAAGCAAAG 


ATAAAAGTGG 


TAACATTGTT 


900 


CTCTCTGCCA 


AAGAAGGTGA 


AGCGGAAATT 


GGCGGTGTAA 


TTTCCGCTCA 


AAATCAGCAA 


960 


GCCAAAGGTG 


GTAAGTTCAT 


GATTACAGGT 


GATAAAGTCA 


CATTAAAAAC 


AGGTGCAGTT 


1020 


ATCGACCTTT 


CAGGTAAAGA 


AGGGGGAGAG 


ACTTATCTTG 


GCGGTGATGA 


GCGTGGCGAA 


1080 


GGTAAAAATG 


GTATTCAATT 


AGCGAAGAAA 


ACCTCTTTAG 


AAAAAGGCTC 


GACAATTAAT 


1140 


GTATCAGGCA 


AAGAAAAAGG 


CGGGCGCGCT 


ATTGTATGGG 


GCGATATTGC 


ATTAATTAAT 


1200 


GGTAACATTA 


ATGCTCAAGG 


TAGCGATATT 


GCTAAAACTG 


GCGGCTTTGT 


GGAAACATCA 


1260 
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GGACATGACT TATCCATTGG TCATGATGTG ATTGTTGACG CTAAAGAGTG GTTATTAGAC 1320 

CCAGATGATG TGTCCATTGA AACTCTTACA TCTGGACGCA ATAATACCGG CGAAAACCAA 1380 

GGATATACAA CAGGAGATGG GACTAAAGAG TCACCTAAAG GTAATAGTAT TTCTAAACCT 1440 

ACATTAACAA ACTCAACTCT TGAGCAAATC CTAAGAAGAG GTTCTTATGT TAATATCACT IS 00 

GCTAATAATA GAATTTATGT TAATAGCTCC ATCAACTTAT CTAATGGCAG TTTAACACTT 1560 

CACACTAAAC GAGATGGAGT TAAAATTAAC GGTGATATTA CCTCAAACGA AAATGGTAAT 1620 

TTAACCATTA AAGCAGGCTC TTGGGTTGAT GTTCATAAAA ACATCACGCT TGGTACGGGT 1680 
TTTTTGAATA TTGTCGCTGG GGATTCTGTA GCTTTTGAGA GAGAGGGCGA TAAAGCACGT . _ 1740 

AACGCAACAG ATGCTCAAAT TACCGCACAA GGGACGATAA CCGTCAATAA AGATGATAAA 1800 

CAATTTAGAT TCAATAATGT ATCTATTAAC GGGACGGGCA AGGGTTTAAA GTTTATTGCA I860 

AATCAAAATA ATTTCACTCA TAAATTTGAT GGCGAAATTA ACATATCTGG AATAGTAACA 1920 

ATTAACCAAA CCACGAAAAA AGATGTTAAA TACTGGAATG CATCAAAAGA CTCTTACTGG 1980 

AATGTTTCTT CTCTTACTTT GAATACGGTG CAAAAATTTA CCTTTATAAA ATTCGTTGAT 2040 

AGCGGCTCAA ATTCCCAAGA TTTGAGGTCA TCACGTAGAA GTTTTGCAGG CGTACATTTT 2100 

AACGGCATCG GAGGCAAAAC AAACTTCAAC ATCGGAGCTA ACGCAAAAGC CTTATTTAAA 2160 

TTAAAACCAA ACGCCGCTAC AGACCCAAAA AAAGAATTAC CTATTACTTT TAACGCCAAC 2220 

ATTACAGCTA CCGGTAACAG TGATAGCTCT GTGATGTTTG ACATACACGC CAATCTTACC 2280 

TCTAGAGCTG COGGCATAAA CATGGATTCA ATTAACATTA CCGGCGGGCT TGACTTTTCC 2340 

ATAACATCCC ATAATCGCAA TAGTAATGCT TTTOAAATCA AAAAAGACTT AACTATAAAT 2400 

GCAACTGGCT CGAATTTTAG TCTTAAGCAA ACOAAAGATT CTTTTTATAA TGAATACAGC 2460 

AAACACGCCA TTAACTCAAG TCATAATCTA ACCATTCTTG GCGGCAATGT CACTCTAGGT 2520 

OGGGAAAATT CAAGCAGTAG CATTAOGGGC AATATCAATA TCACCAATAA AGCAAATGTT 2580 

ACATTACAAG CTGACACCAG CAACAGCAAC ACAGGCTTGA AGAAAAGAAC TCTAACTCTT 2640 

GGCAATATAT CTGTTGAGGG GAATTTAAGC CTAACTGGTG CAAATGCAAA CATTGTCGGC 2700 

AATCTTTCTA TTGCAOAAQA TTCCACATTT AAAGGAGAAG CCAGTGACAA CCTAAACAtC 2760 

ACCGGCACCT TTACCAACAA CGGTACCGCC AACATTAATA TAAAACAAGG AGTGGTAAAA 2820 

CTCCAAGGCG ATATTATCAA TAAAGGTGGT TTAAATATCA CTACTAACGC CTCAGGCACT 2880 

CAAAAAACCA fTATTAACGG AAATATAACT AACGAAAAAG GCGACTTAAA CATCAAGAAT 2940 

ATTAAAGCCG AOGCOGAAAT CCAAATTGGC GGCAATATCT CACAAAAAGA AGGCAATCTC 3000 

ACAATTTCTT CTGATAAAGT AAATATTACC AATCAGATAA CAATCAAAGC AGGCGTTGAA 3060 

GGGGGGCGTT CTGATTCAAG TGAGGCAGAA AATGCTAACC TAACTATTCA AACCAAAGAG 3120 

TTAAAATTGG CAGGAGACCT AAATATTTCA GGCTTTAATA AAGCAGAAAT TACAGCTAAA 3180 

AATGGCAGTG ATTTAACTAT TGGCAATGCT AGCGGTGGTA ATGCTGATGC TAAAAAAGTG 3240 

ACTTTTGACA AGGTTAAAGA TTCAAAAATC TCGACTGACG GTCACAATGT AACACTAAAT 3300 
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AGCGAAGTGA AAACGTCTAA TGGTAGTAGC AATGCTGGTA ATGATAACAG 


CACCGGTTTA 


3360 


ACCATTTCCG CAAAAGATGT AACGGTAAAC AATAAOGTTA CCTCCCACAA GACAATAAAT 


3420 


ATCTCTGCCG CAGCAGGAAA TGTAACAACC AAAGAAGGCA CAACTATCAA TGCAACCACA 


3480 


GGCAGCGTGG AAGTAACTGC TCAAAATGGT ACAATTAAAG GCAACATTAC 


CTCGCAAAAT 


3540 


GTAACAGTGA CAGCAACAGA AAATCTTGTT ACCACAGAGA ATGCTGTCAT 


TAATGCAACC 


3600 


AGCGGCACAG TAAACATTAG TACAAAAACA GGGGATATTA AAGGTGGAAT 


TGAATCAACT 


3660 


TCCGGTAATG TAAATATTAC AGCGAGCGGC AATACACTTA AGGTAAGTAA 


TATCACTGGT 


3720 


CAAGATGTAA CAGTAACAGC GGATGCAGGA GCCTT6ACAA CTACAGCAGG 


CTCAACCATT- 


3780- 


AGTGCGACAA CAGGCAATGC AAATATTACA ACCAAAACAG GTGATATCAA 


CGGTAAAGTT 


3640 


GAATCCAGCT CCGGCTCTGT AACACTTGTT GCAACTGGAG CAACTCTTGC 


TGTAGGTAAT 


3900 


ATTTCAGGTA ACACTGTTAC TATTACTGCG GATAGCGGTA AATTAACCTC 


CACAGTAGGT 


3960 


TCTACAATTA ATGGGACTAA TAGTGTAACC ACCTCAAGCC AATCAGGCGA 


TATTGAAGGT 


4020 


ACAATTTCTG GTAATACAGT AAATGTTACA GCAAGCACTG GTGATTTAAC 


TATTGGAAAT 


4080 


AGTGCAAAAG TTGAAGCGAA AAATGGAGCT GCAACCTTAA CTGCTGAATC 


AGGCAAATTA 


4140 


ACCACCCAAA CAGGCTCTAG CATTACCTCA AGCAATGGTC AGACAACTCT 


TACAGCCAAG 


4200 


GATAGCAGTA TCGCAGGAAA CATTAATGCT GCTAATGTGA CGTTAAATAC 


CACAGGCACT 


4260 


TTAACTACTA CAGGGGATTC AAAGATTAAC GCAACCAGTG GTACCTTAAC 


AATCAATGCA 


4320 


AAAGATGCCA AATTAGATGG TGCTGCATCA GGTGACCGCA CAGTAGTAAA TGCAACTAAC 


4380 


GCAAGTGGCT CTGGTAACGT GACTGCGAAA ACCTCAAGCA GCGTGAATAT 


CACCGGGGAT 


4440 


TTAAACACAA TAAATGGGTT AAATATCATT TCGGAAAATG GTAGAAACAC 


TGTGCGCTTA 


4500 


AGAGGCAAGG AAATTGATGT GAAATATATC CAACCAGGTG TAGCAAGCGT 


AGAAGAGGTA 


4560 


ATTGAAGCGA AACGCGTCCT TGAGAAGGTA AAAGATTTAT CTGATGAAGA AAGAGAAACA 


4620 


CTAGCCAAAC TTGGTGTAAG TGCTGTACGT TTCGTTGAGC CAAATAATGC 


CATTACGGTT 


4680 


AATACACAAA ACGAGTTTAC AACCAAACCA TCAAGTCAAG TGACAATTTC 


TGAAGGTAAG 


4740 


GCGTGTTTCT CAAGTGGTAA TGGCGCACGA GTATGTACCA ATGTTGCTGA 


CGATGGACAG 


4800 


CAG 




4803 



(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS ? 

(A) LENGTH: 1599 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
IS 10 is 

Val Ala Val Ser Glu Leu Thr Arg Gly Cys Asp His Ser Thr Glu Lys 
20 25 30 

Gly Ser Glu Lys Pro Val Arg Thr Lys Val Arg His Leu Ala Leu Lys 
35 40 45 

Pro Leu Ser Ala lie Leu Leu Ser Leu Gly Met Ala Ser He Pro Gin 
50 55 60 

-Ser- Val Leu Ala- Ser Gly Leu Gin Gly Met Ser- Val -Val His Gly Thr 
€5 70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr Thr He Arg Asn Ser Val 
85 90 95 

Asn Ala He He Asn Trp Lys Gla Phe Asn He Asp Gin Asn Glu Met 
100 105 no 

Glu Gin Phe Leu Gin Glu Ser Ser Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asp Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Gly 
130 135 140 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 150 155 160 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Leu Glu Gin Thr Lys Asp Lys 
180 IBS 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 23S 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala He Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Lys Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 31S 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 
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Thr Gly Ala Val lie Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Thr Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lvs 
370 375 380 

Glu Lys Gly Gly Arg Ala lie Val Trp Gly Asp He Ala Leu He Asp 
385 390 395 400 

Gly Asn He Asn Ala Gin Gly Lys Asp He Ala Lys Thr Gly Gly Phe 

405 410^ . _ _ _ 4i5 

Val Glu Thr Ser Gly His Tyr Leu Ser He Asp Asp Asn Ala He Val 
420 42S 430 

Lys Thr Lys Glu Trp Leu Leu Asp Pro Glu Asn Val Thr He Glu Ala 
435 440 445 

Pro Ser Ala Ser Arg Val Glu Leu Gly Ala Asp Arg Asn Ser His Ser 
450 455 460 

Ala Glu Val He Lys Val Thr Leu Lys Lys Asn Asn Thr Ser Leu Thr 
465 470 475 480 

Thr Leu Thr Asn Thr Thr He Ser Asn Leu Leu Lys Ser Ala His Val 
485 490 495 

Val Asn He Thr Ala Arg Arg Lys Leu Thr Val Asn Ser Ser He Ser 
500 505 510 

He Glu Arg Gly Ser His Leu He Leu His Ser Glu Gly Gin Gly Gly 
515 520 525 

Gin Gly Val Gin He Asp Lys Asp He Thr Ser Glu Gly Gly Asn Leu 
530 535 540 

Thr He Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn He Thr Leu 
545 550 555 560 

Gly Ser Gly Phe Leu Asn He Thr Thr Lys Glu Gly Asp He Ala Phe 
S65 570 575 

Glu Asp Lys Ser Gly Arg Asn Asn Leu Thr He Thr Ala Gin Gly Thr 
580 585 590 

He Thr Ser Gly Asn Ser Asn Gly Phe Arg Phe Asn Asa Val Ser Leu 
595 600 605 

Asn Ser Leu Gly Gly Lys Leu Ser Phe Thr Asp Ser Arg Glu Asp Arg 
610 61S 620 

Gly Arg Arg Thr Lys Gly Asn He Ser Asn Lys Phe Asp Gly Thr Leu 
625 630 635 640 

Asn He Ser Gly Thr Val Asp He Ser Met Lys Ala Pro Lys Val Ser 
645 650 655 

Trp Phe Tyr Arg Asp Lys Gly Arg Thr Tyr Trp Asn Val Thr Thr Leu 
660 665 670 

Asn Val Thr Ser Gly Ser Lys Phe Asn Leu Ser He Asp Ser Thr Gly 
675 680 685 
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Ser Gly ser Thr Gly Pro Ser lie Arg Asn Ala Glu Leu Asn Gly n e 
" u 695 700 

Thr Phe Asn Lys Ala Thr Phe Asn lie Ala Gin Gly Ser Thr Ala Asn 

715 720 
Phe Ser He Lys Ala Ser He Met Pro Phe Lys Ser Asn Ala Asn Tyr 

725 7 30 735 

Ala Leu Phe Asn Glu Asp He Ser Val ser Gly Gly Gly Ser Val 



745 



Asn 



750 



Phe Lys Leu Asn Ala Ser Ser Ser Asn He Gin Thr Pro Gly Val He 

7 65 

lie Lys ser Gin Asn Phe Asn Val Ser Gly Gly Ser Thr Leu Asn Leu 
" u 775 780 

Lys Ala Glu Gly Ser Thr Glu Thr Ala Phe Ser lie Glu Asn Asp Leu 

Asn Leu Asn Ala Thr Gly Gly Asn He Thr He Arg Gin Val Glu Gly 
80S 810 81S y 

Thr Asp ser Arg Val Asn Lys Gly val Ala Ala Lys Lys Asn He Thr 
820 825 830 

Phe Lys Gly Gly Asn He Thr Phe Gly Ser Gin Lys Ala Thr Thr Glu 
J 840 345 

He Lys Gly Asn Val Thr He Asn Lys Asn Thr Asn Ala Thr Leu Arg 
850 85S 860 

Gly Ala Asn Phe Ala Glu Asn Lys Ser Pro Leu Asn He Ala Gly Asn 

0 87S 880 

Val He Asn Asn Gly Asn Leu Thr Thr Ala Gly Ser He He Asn He 
885 890 e9S 

Ala Gly Asn Leu Thr Val Ser Lys Gly Ala Asn Leu Gin Ala He Thr 
so ° 905 910 

Asn Tyr Thr Phe Asn Val Ala Gly Ser Phe Asp Asn Asn Gly Ala Ser 

92( > 925 

Asn lie Ser He Ala Arg Gly Gly Ala Lys Phe Lys Asp He Asn Asn 
" u 935 940 

Thr Ser Ser Leu Asn lie Thr Thr Asn Ser Asp Thr Thr Tyr Arg Thr 

955 960 

He He Lys Gly Asn lie Ser Asn Lys Ser Gly Asp Leu Asn He He 

65 970 975 

Asp Lys Lys Ser Asp Ala Glu He Gin He Gly Gly Asn He Ser Gin 
990 985 99o 

Lys Glu Gly Asn Leu Thr lie Ser Ser Asp Lys Val Asn He Thr Asn 
»" 1000 1005 

Gin lie Thr He Lys Ala Gly Val Glu Gly Gly Arg Ser Asp Ser Ser 

1015 1020 

Gl^Ala Glu Asn Ala As^Leu Thr He Gin ThrLys Glu Leu Lys Leu 
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Ala Gly Asp Leu Asn lie Ser Gly Phe Asn Lys Ala Glu lie Thr Ala 
1045 1050 1055 

Lys Asn Gly Ser Asp Leu Thr He Gly Asn Ala Ser Gly Gly Asn Ala 
1060 1065 1070 

Asp Ala Lys Lys Val Thr Phe Asp Lys Val Lys Asp Ser Lys He Ser 
1075 1080 1065 

Thr Asp Gly His Asn Val Thr Leu Asn Ser Glu Val Lys Thr Ser Asn 
1090 1095 HOO 

Gly Ser Ser Asn Ala Gly Asn Asp Asn Ser Thr Gly Leu Thr He Ser 
1105 1110 HIS 1120 

Ala Lys Asp Val Thr Val Asn Asn Asn Val Thr Ser His Lys Thr He 
1125 1130 1135 

Asn He Ser Ala Ala Ala Gly Asn Val Thr Thr Lys Glu Gly Thr Thr 
1140 H45 1150 

He Asn Ala Thr Thr Gly Ser Val Glu Val Thr Ala Gin Asn Gly Thr 
1155 1160 1165 

lie Lys Gly Asn He Thr Ser Gin Asn Val Thr Val Thr Ala Thr Glu 
1170 1175 HBO 

Asn Leu Val Thr Thr Glu Asn Ala Val He Asn Ala Thr Ser Gly Thr 
1185 H90 H95 1200 

Val Asn He Ser Thr Lys Thr Gly Asp He liys Gly Gly He Glu Ser 
1205 1210 1215 

Thr Ser Gly Asn Val Asn He Thr Ala Ser Gly Asn Thr Leu Lys Val 
1220 1225 1230 

Ser Asn He Thr Gly Gin Asp Val Thr Val Thr Ala Asp Ala Gly Ala 
1235 1240 1245 

Leu Thr Thr Thr Ala Gly Ser Thr He Ser Ala Thr Thr Gly Asn Ala 
1250 1255 1260 

Asn He Thr Thr Lys Thr Gly Asp He Asn Gly Lys Val Glu Ser Ser 
1265 1270 1275 1280 

Ser Gly Ser Val Thr Leu val Ala Thr Gly Ala Thr Leu Ala Val Gly 
1285 1290 1295 

Asn He Ser Gly Asn Thr Val Thr He Thr Ala Asp Ser Gly Lys Leu 
1300 1305 1310 

Thr Ser Thr Val Gly Ser Thr He Asn Gly Thr Asn Ser Val Thr Thr 
1315 1320 1325 

Ser Ser Gin Ser Gly Asp He Glu Gly Thr He Ser Gly Asn Thr Val 
1330 133S 1340 

Asn Val Thr Ala Ser Thr Gly Asp Leu Thr He Gly Asn Ser Ala Lys 
1345 1350 1355 1360 

Val Glu Ala Lys Asn Gly Ala Ala Thr Leu Thr Ala Glu Ser Gly Lys 
1365 1370 1375 

Leu Thr Thr Gin Thr Gly Ser Ser lie Thr Ser Ser Asn Gly Gin Thr 
1380 13B5 1390 
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Thr Leu Thr Ala Lys Asp Ser Ser He Ala Gly Asn He Asn Ala Ala 
13 °5 1400 140S 

1410 Thr AS ° I«s° ly L6U Thr ? 4 l 2o Thr 

Lys lie Asn Ala Thr Ser Gly Thr Leu Thr lie Asn Ala Lys Asp Ala 
1425 14 30 1435 1440 

Lys Leu Asp Gly Ala Ala Ser Gly Asp Arg Thr Val Val Asn Ala Thr 
1445 1450 14S5 

Asn Ala Ser Gly Ser Gly Asn Val Thr Ala Lys Thr Ser Ser Ser Val 
X460 14 «5 X470 

- -Asn-Ii e -Thr_Gly Asp Leu Asn Thr lie Asn Gly Leu Asn lie lie Ser 
1475 1480 1485 

G1U ?fo 0 Gly * Sn Thr LeU *** ^ Glu Asp Val 

a«w 1495 isoo 

His*** " e Pr ° ?cr„ Val Ala Ser Val Glu Glu "*1 He Glu Ala 

15X0 1515 1520 

Lys Arg Val Leu Glu Lys Val Lys Asp Leu Ser Asp Glu Glu Arg Glu 
1525 1S30 1S3S 

Thr Leu Ala Lys Leu Gly Val Ser Ala Val Arg Phe Val Glu Pro Asn 

1545 1550 

Asn Ala lie Thr Val Asn Thr Gin Asn Glu Phe Thr Thr Lys Pro Ser 
1S5S 1560 i 56 5 3 

Sfir ?«n Val Thr 116 Ser Glu Gly Lys Ma <*■ Phe ser Gly Asn 
AS/ ° 1S75 i5Bo 

Gly Ala Arg Val Cys Thr Asn Va l Ala Asp Asp Gly Gin Gin Pro 

1590 1595 

2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1600 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
15 "15 
Val Ala Val Ser Glu Leu Thr Arg Gly Cys Asp His Ser Thr Glu Lys 
20 2 5 30 

Gly ser Glu Lys Pro Val Arg Thr Lys Val Arg His Leu Ala Leu Lys 
>»5 40 45 * 

Pro Leu Ser Ala lie Leu Leu Ser Leu Gly Met Ala Ser He Pro Gin 
so 5S 60 

Ser Val Le U Ala Ser Gly Leu Gin Gly Met Ser Val Val His Gly Thr 

" 70 75 



80 
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Ala Thr Met Gin Val Asp Gly Asn Lys Thr Thr lie Arg Asn Ser Val 
65 90 95 

Asn Ala He He Asn Tip Lys Gin Phe Asn He Asp Gin Asn Glu Met 
100 105 xio 

Glu Gin Phe Leu Gin Glu Ser Ser Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asp Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Gly 
130 135 140 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 150 155 160 

He He Asn Thr Asn- Gly Phe- Thr Ala Ser Thr Leu Asp lie Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Leu Glu Gin Thr Lys Asp Lys 
180 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 ' 255 

Glu Asn Glu Ala He Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Lys Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Thr Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Arg Ala He Val Trp Gly Asp He Ala Leu He Asp 
385 390 395 400 

Gly Asn He Asn Ala Gin Gly Ser Asp He Ala Lys Thr Gly Gly Phe 
405 410 415 



Val Glu Thr Ser Gly His Asp Leu Ser He Gly Asp Asp Val He Val 
420 425 430 



WO 97/36914 



PCT/US97/04707 



99 



Asp Ala Lya Glu Trp Leu Leu Asp Pro Asp Asp Val Ser lie Glu Thr 
43S 440 445 

Leu Thr Ser Gly Arg Asn Asn Thr Gly Glu Asn Gin Gly Tyr Thr Thr 
4S0 455 460 

Gly Asp Gly Thr Lys Glu Ser Pro Lys Gly Asn Ser lie Ser Lys Pro 
465 47 ° 475 480 

Thr Leu Thr Asn Ser Thr Leu Glu Gin lie Leu Arg Arg Gly ser Tyr 
48S 490 495 

Val Asn lie Thr Ala Asn Asn Arg He Tyr Val Asn Ser Ser lie Asn 
s °0 SOS 510 

Leu Ser Asn Gly Ser Leu Thr Leu His Thr Lys Arg Asp Gly Val Lys 
515 S20 52S 

He Asn Gly Asp He Thr Ser Asn Glu Asn Gly Asn Leu Thr He Lys 
530 535 54 0 ' 

Ala Gly Ser Trp Val Asp Val His Lys Asn He Thr Leu Gly Thr Glv 
545 550 5S5 56 J 

Phe Leu Asn He Val Ala Gly Asp Ser Val Ala Phe Glu Arg Glu Gly 
565 570 S75 

Asp Lys Ala Arg Asn Ala Thr Asp Ala Gin He Thr Ala Gin Gly Thr 
580 585 5 9o 

He Thr Val Asn Lys Asp Asp Lys Gin Phe Arg Phe Asn Asn Val Ser 
595 600 60S 

Leu Asn Gly Thr Gly Lys Gly Leu Lys Phe He Ala Asn Gin Asn Asn 
**° 615 620 

Phe Thr His Lys Phe Asp Gly Glu He Asn He Ser Gly He Val Thr 
625 630 635 640 

He Asn Gin Thr Thr Lys Lys Asp Val Lys Tyr Trp Asn Ala Ser Lys 
M 5 650 655 

Asp Ser Tyr Trp Asn Val Ser Ser Leu Thr Leu Asn Thr Val Gin Lvs 
6«0 665 670 

Phe Thr Phe He Lys Phe Val Asp Ser Gly Ser Asn Gly Gin Asp Leu 
675 680 685 

Arg ser Ser Arg Arg Ser Phe Ala Gly Val His Phe Asn Gly He Gly 
690 69S 700 Y 

Gly Lys Thr Asn Phe Asn He Gly Ala Asn Ala Lys Ala Leu Phe Lys 
705 7 10 71S 720 

Leu Lys Pro Asa Ala Ala Thr Asp Pro Lys Lys Glu Leu Pro lie Thr 
725 730 735 

Phe Asn Ala Asn He Thr Ala Thr Gly Asn Ser Asp Ser Ser Val Met 
740 745 750 

Phe Asp He His Ala Asn Leu Thr Ser Arg Ala Ala Gly He Asn Met 
7 55 760 765 

Asp ser He Asn lie Thr Gly Gly Leu Asp Phe Ser He Thr ser His 
7 ™ 77S 780 
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Asn Arg Asn Ser Asn Aia 
785 790 



805 



820 



835 



urXU 


IXv 


Lys 


Lys Asp 
795 


Leu 


Thr 


lie 


Asn 
800 


Leu 


Lys 


Gin 
810 


Thr 


Lys 


Asp 


Ser 


Phe 
815 


Tyr 


lie 


Asn 
825 


Ser 


Ser 


His 


Asn 


Leu 
830 


Thr 


lie 


Gly 
840 


Gly 


Glu 


Asn 


Ser 


Ser 
845 


Ser 


Ser 


He 


Asn 


Lys 


Ala 


Asn 


Val 
860 


Thr 


Leu 


Gin 


Ala 



850 8S5 
Asp Thr Ser Asn Ser Asn Thr Gly Leu Lys Lys Arg Thr Leu Thr Leu 
865 870 8 5 



Gly Asn He Ser Val Glu Gly Asn Leu Ser Leu Thr Gly Ala Asn Ala 

885 83b 

Asn lie Val Gly Asn Leu Ser He Ala Glu Asp Ser Thr Phe Lys Gly 
900 905 91 

Glu Ala Ser Asp Asn Leu Asn He Thr Gly Thr Phe Thr Asn Asn Gly 
915 920 925 

Thr Ala Asn lie Asn He Lys Gly Val Val Lys Leu Gly Asp He Asn 

930 »35 940 

Asn Lys Gly Gly Leu Asn He Thr Thr Asn Ala Ser Gly Thr Gin Lys 
945 950 

Thr lie He Asn Gly Asn He Thr Asn Glu Lys Gly Asp Leu Asn He 
9£5 970 *' 9 



Lys Asn He Lys Ala Asp Ala Glu He Gin lie Gly Gly Asn lie Ser 

I 

995 1000 100S 



980 985 
Gin Lys Glu Gly Asn Leu Thr He Ser Ser Asp Lys Val Asn He Thr 



Asn Gin He Thr He Lys Ala Gly Val Glu Gly Gly Arg Ser Asp Ser 

1010 i<» 15 10 * 

ser Glu Ala Glu Asn Ala Asn Leu Thr He Gin Thr Lys Glu Leu Lys 
1025 I" 30 1035 

Leu Ala Gly Asp Leu Asn He Ser Gly Phe Asn Lys Ala Glu He Thr 
1045 1050 AU " 



Ala Lys Asn GlySer Asp Leu Thr He^ly Asn Ala Ser Gl^Gly Asn 

Ala Asp Ala Lys Lys Val Thr Phe Asp Lys Val Lys Asp Ser Lys He 
1075 1080 

Ser Thr Asp Gly His Asn Val Thr Leu Asn Ser Glu val Lys Thr Ser 

1090 109S 
Asn Gly Ser Ser Asn Ala^ly Asn Asp Asn SerThr Gly Leu Thr H^ 

Ser Ala Lys Asp Val_Thr Val Asn Asn AsnVal Thr Ser His Ly^Thr 



1125 
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lie Asn He Ser Ala Ala Ala Gly Asn Val Thr Thr Lys Glu Gly Thr 
X140 1145 X150 

Thr He Asn Ala Thr Thr Gly Ser Val Glu Val Thr Ala Gin Asn Gly 
1155 1160 1165 

Thr He Lys Gly Asn He Thr Ser Gin Asn Val Thr Val Thr Ala Thr 
1170 1175 1180 

Glu Asn Leu Val Thr Thr Glu Asn Ala Val He Asn Ala Thr Ser Gly 
1185 1190 1195 1200 

Thr Val Asn He Ser Thr Lys Thr Gly Asp He Lys Gly Gly He Glu 
1205 1210 1215 

Ser Thr Ser Gly Asn Val Asn He Thr Ala Ser Gly Asn Thr Leu Lys 
1220 1225 1230 

Val Ser Asn He Thr Gly Gin Asp Val Thr val Thr Ala Asp Ala Gly 
1235 1240 1245 

Ala Leu Thr Thr Thr Ala Gly Ser Thr He Ser Ala Thr Thr Gly Asn 
1250 1255 1260 

Ala Asn He Thr Thr Lys Thr Gly Asp He Asn Gly Lys Val Glu Ser 
1265 1270 1275 1280 

Ser Ser Gly Ser Val Thr Leu Val Ala Thr Gly Ala Thr Leu Ala Val 
1285 1290 1295 

Gly Asn He Ser Gly Asn Thr Val Thr He Thr Ala Asp Ser Gly Lys 
1300 1305 1310 

Leu Thr Ser Thr Val Gly Ser Thr He Asn Gly Thr Asn Ser Val Thr 
1315 1320 1325 

Thr Ser Ser Gin Ser Gly Asp He Glu Gly Thr He Ser Gly Asn Thr 
1330 1335 1340 

Val Asn Val Thr Ala Ser Thr Gly Asp Leu Thr He Gly Asn Ser Ala 
1345 1350 1355 1360 

Lys Val Glu Ala Lys Asn Gly Ala Ala Thr Leu Thr Ala Glu Ser Gly 
1365 1370 137S 

Lys Leu Thr Thr Gin Thr Gly Ser Ser He Thr Ser Ser Asn Gly Gin 
1380 1385 X390 

Thr Thr Leu Thr Ala Lys Asp Ser Ser He Ala Gly Asn He Asn Ala 
1395 1400 1405 

Ala Asn Val Thr Leu Asn Thr Thr Gly Thr Leu Thr Thr Thr Gly Asp 
1410 1415 1420 

Ser Lys He Asn Ala Thr Ser Gly Thr Leu Thr He Asn Ala Lys Asp 
1425 1430 1435 1440 

Ala Lys Leu Asp Gly Ala Ala Ser Gly Asp Arg Thr Val Val Asn Ala 
1445 1450 1455 

Thr Asn Ala Ser Gly Ser Gly Asn Val Thr Ala Lys Thr Ser Ser Ser 
1460 1465 1470 

Val Asn He Thr Gly Asp Leu Asn Thr He Asn Gly Leu Asn He He 
1475 1480 1485 
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Ser Glu Asn Gly Arg Asn Thr Val Arg Leu Arg Gly Lys Glu lie Asp 
1490 1495 1500 

Val Lys Tyr lie Gin fro^Gly Val Ala Ser Val^Glu Glu Val He Glu 



1505 



1510 



1520 



Ala Lvs Arg Val Leu Glu Lys Val Lys Asp Leu Ser Asp Glu Glu Arg 

* 3 i ctn 1535 



1525 



1530 



Glu Thr Leu Ala Lys Leu Gly Val Ser Ala Val Arg Phe Val Glu Pro 
1540 1545 1550 

Asn Asn Ala lie Thr Val Asn Thr Gin Asn Glu Phe Thr Thr Lys Pro 
1555 1560 1565 



ser ser Gin Val Thr He Ser Glu Gly Lys Ala Cys Phe Ser Ser Gly 
1S70 1S75 isao 

Asn Gly Ala Arg Val Cys Thr Asn Val Ala J^P^P Gl * Gln Gln 



158S 



1S90 



1600 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

val Asp Glu Val He Glu Ala Lys Arg He Leu Glu Lys Val Lys Asp 
1 5 10 15 

Leu Ser Asp Glu Glu Arg Glu Ala Leu Ala Lys Leu Gly 



WO 97136914 103 PCT/US97/04707 

What I claim is: 

IneJ" i8 t a l ed ^ PUr±fied nUClftic molecule 
* m ° leCUlar protein (BMW, HMW3 or 

HMW4 of a non-typeable Haemophilus strain or a variant 

abUity to protect against disease caused by a non- 
typeable HaemopbiJus strain, having .- 

(a) the DNA sequence shown in Figure 8 (SEQ id No- 
7, and encoding protein HMW3 having the derived 

amxno acid sequence of Figure 10 (SEQ id No- 9) or 

(b) the DNA sequence shown in Figure 9 (SEQ id'no" 
8) and encoding protein HMW4 having the derived 
anu.no acid sequence of Figure 10 (seq id No- io) 

I: J* 'T 3 ^ ^ nUCleiC acid 

encoding a high molecular weight protein (HMW) of a non- 
typeable Haemophilus strain, which i« selected from the 
group consisting of: 

(a) a DMA sequence as shown in any one of Figures 
8 and 9 (SEQ ID Nos: 7 and 8) ; 

(b) a DNA sequence encoding an amino acid 
sequence as shown in Figure 10 (SEQ ID Nos: 9 and 
10) ; or 

(c) a DNA sequence encoding a high molecular 
weight protein of a non-typeable Haemophilus strain 
which hybridizes under stringent conditions to any 
one of the DNA. sequences of (a) and (b) 

3^ The nucleic acid molecule of claim 2 wherein the 
DNA sequence (c) have at least about a 90% identity of 
sequence to the DNA sequences (a) or (b) 

A transfoCTatio - «* * host comprising 

the nuclexc acid molecule of claim 2. 

^ i80lated *** Wgfa molecular weight 

(HMW) protein of non-typeable Haemophilus or any variant 
or fragment thereof retaining the immunological abH itv 
to protect against disease caused by a non-typeable 
Haemophilus strain, which is characterized by at least 
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one surface -exposed B-cell epitope which is recognized 
by monoclonal antibody AD6, 

6. The protein of claim 5 which is HMW1 encoded by the 
TMK sequence shown in Figure 1 (SEQ ID No: l) , having 
the derived amino acid sequence of Figure 2 {SEQ ID No: 
2) and having an apparent molecular weight of 125 kDa. 

7. The protein claim 5 which is HMW2 encoded by the 
DNA sequence shown in Figure 3 (SEQ ID Mo: 3) and having 
the derived amino acid sequence of Figure 4 (SEQ ID No; 
4) and having an apparent molecular weight of 12 o kDa. 

8. The protein claimed in claim 5 which is HMW3 
encoded by the DNA sequence shown in Figure 8 (SEQ ID 
No: 7) and having the derived amino acid sequence of 
Figure 10 (SEQ ID No: 9) and having an apparent 
molecular weight of 125 kDa. 

9. The protein claimed in claim 5 which is HMW4 
encoded by the DNA sequence shown in Figure 9 (SEQ ID 
No: 3) and having the derived amino acid sequence shown 
in Figure 10 (SEQ id No: 10) and having an apparent 
molecular weight of 123 kDa. 

10. A conjugate comprising a protein as claimed in 
claim 5 linked to an antigen, hapten or polysaccharide 
for eliciting an immune response to said antigen, hapten 
or polysaccharide. 

11. The conjugate as claimed in claim 10 wherein said 
polysaccharide is a protective polysaccharide against 
Haemophilus influenzae type b. 

12. A synthetic peptide having an amino acid sequence 
containing at least six amino acids and no more than 150 
amino acids and corresponding to at least one protective 
epitope of a high molecular weight protein HMWi, HMW2, 
HMW3 or HMW4 of non-typeable Haemophilus influenzae, 

wherein the epitope is recognized by at least one of 
monoclonal antibodies ADS and 10C5, 

13. The peptide as claimed in claim 12 wherein the 
epitope is located within 75 amino acids of the carboxy 
terminus of the HMWI or HMW2 protein. 
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.mr-rrrrr CAATTGACAC GSGGTTCT6A CCATTCCACA GAAAAAGGO 
1 ATGAACAAGA TATATCGTCT CAAATTC*GC AAACCCCTGA AT6CTTTGGT TGCTCTCTCT CAATTGACAC nam 

1„ CTGAAAAACC TGTTCCTACG AAACTACCCC ACUCCCCTT AAACCCACTT TCCCCTATAT TCCTATCTTT CGCCATCCCA TCCATTCCGC AATCTGTTTT 

„ ^c^aar ttacaccgaa tcaccctcot acaccgtaca couccatcc aagtacaccg caataaaacc actatccgta ataccctcaa tgctatcatc 

M , AATTOCAAAC AATTTAACAT TGAGCAAAAT CAAATCGTGC AGTTTTTACA ACAAAGCACC AACTCTGCCC TTITCAACCC TGTIACATCT CACCAAATCT 
,01 CCCAATTAAA AG06MTTTA CATTCTAAOG CACAA6TCTT TTTAATCAAC CCAAATGGTA TCACAATAGG TAAAGACOCA ATTATIAACA CTAATGCCf t 
,01 TACIBCTTCT ACCCTAGACA TTTCIAAC6A AAACATCAAC GCCC8TAATT TCACCCTTGA GCAAACCAAG 6ATAAA6CAC ICCCTCAAAT CGTCAATCAC 
« CCTTTAATTA CCCTTGCTAA ACACCCTACC CTAAACCTTA TTGtTGGCAA ACTCAAAAAC CACGCCGTCA TTAGCGTAAA TGCCGCTAGT ATTTCTTTAC 
701 HGCAGGGCA AAAAATCACC ATCACC6ATA TAATAAATCC AACCATCACT TACACCATTC CTCCACCTCA AAACCAAGCG ATCAATCTCG GCCATATTTT 
», TGCCAAAGGT GGTAACATTA ATGTCCGCCC TGCCACTATT GGCAATAAAG GTAAACTTTC TGCCGACTCt GTAAGCAAAO ATAAAAGTCG TAACATTCTT 
«1 aCTCTCCCA AACAACGTGA AGCGGAAATT CGCGGTGTAA TTTCCGCICA AAATCAGCAA CCCAAAGGTG GTAAGTTGAI GATTACAGGC GATAAA6TTA 
1001 CATTGAAAAC CGGTGCAGTT ATCGACCTTT CCGGTAAAGA AGGCGCAGAA ACTTATCTTG CCGGtGACGA CCGTGCCCAA CGTAAAAACG CCAITCAATT 
1101 AGCAAAGAAA ACCACTTTAG AAAAAGGCTC AACAATTAAT GTGTCAGGTA AAGAAAAAGG TGGGCGCGCT ATTGTATOGG GCGATATTGC GTTAATT6AC 
1201 OCCAATATTA ATCCOCAACG TAAAGATATC 6CTAAAACTG CTGGTTTTGT CGACACCTCG GGGCATTACT TATCCATTGA TGATAAOGCA ATTCTTAAAA 
001 CAAAAGAATG GCTACTA6AC CCAGAGAATG TGACTATTGA AGCTCCTTCC CCTTCTCGCG TCGACCTGCG TCOCCATAGG AATTCCCACT C6GCABAGGT 
W01 GATAAAAGTC ACCCTAAAAA AAAATAACAC aCOTCACA ACACTAACCA ATACAACCAT TTCAAATCTT CTGAAAAGTC CCCACGTOGT GAACATAACS 
1501 GCAA6GA6AA AACTTACCGT TAATAGCTCT ATCAGTATAG AAACACGCTC CCACTTAATT CTCCACAGTG AACGTCAGGC CGGICAAGCT CTTCAGATTG 
1601 ATAAAGATAT TACTTCTGAA GCCCGAAATT •TAACCATTTA TTCTGGCGGA TGGGTTCAT6 TTCATAAAAA TATTACGCTT GGTAGOBGCT TTTIAAACAT 
1701 CACAACTAAA GAAGGAGATA TC60STTCGA AGACAAGTCT GGACGGAACA ACCTAACCAT TACAGCCCAA GGGACCATCA CCTCAGGTAA TA6TAAC6CC 
I*, TTTAGATTTA ACAACGTC1C TCTAAACAGC CTTGCCGGAA A6CT6AGCTT TACTGACAGC AGAGAGGACA GAGGTAGAAG AACTAAGGGT AAIATCICAA 
190 , CCCAACCTTA AACATTTCCC CAACIGIAGA TATCTCAATG AAAGCAOXA AAGTCAGCTG GTTTIACAGA GACAAAGGAC CCACCTACTG 

2001 GAACGTAACC ACTTTAAATG TTACCtCGGG TAGTAAATTT AACCtCTCCA TTGACAGCAC AGGAAGTGGC TCAACACGTC CAACCATACG CAATGCAGAA 
2I0 1 TTAAATCGCA TAACATTTAA TAAAGCCACT TTTAATATCG CACAAGGCTC AACAGCTAAC TTTAGCATCA AGGCATCAAT AATGCCCTTI AAGAGTAACG 
801 CTAACTACGC ATTATTTAAT GAAGATATTT CAGTCTCACC CGCGGGTAGC GTTAATTTCA AACITAACGC CTCATCTACC AACATACAAA CCCCTCCCCT 
30, AATTATAAAA TCTCAAAACT TTAATGTCtC AGGAGCGTCA ACTTTAAATC TCAAGGCTGA AGGTTCAACA CAAACCGCTT nTCAATAGA AAATGATTTA 
' 2401 AACTTAAACG CCACCCGTGG CAATATAACA ATCAGACAAC TOBAGGGTAC CCATTCACGC CICAACAAAG GTCTCCCACC CAAAAAAAAC ATAACTTTTA 
2501 AAGGGGGTAA TATCACCTTC GGCTCICAAA AAGCCACAAC AGAAATCAAA GGCAATGTTA CCATCAA1AA AAACACTAAC GCTACTCTTC GTGGTGCGAA 
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^ Tmaw AACAAATCCC CTTTAAATAT AGCAGGAAAT GTTATTAATA ATCGCAACCT TACCACTGCC CGCTCCATTA 1CMTATACC CGCAAATCTT 
2 „ t ACTCTTTCAA AAGGCGCTAA CCTTCAAGC, ATAACAAATT ^ TGTAGCCGGC TCATTTGACA ACAATGGCGC UCAAACAT, «CA„GCCA 
2M , ^cc «AAmAAA «TATCAAU ACACCAGTAG cnAAA«n accaccaac, ctgataccac UACCGCACC ATTATAAAAG GCAAUUTC 
^CA GGTGATTTGA ATAnATTGA TAAAAAAAGC GACGCTGAAA TCCAAATTGG CGGCAATATC TCACAAAAAC AACGCAATCT CACAATUa 
,001 TCTCATAAAG TUATATUC CAATCAGATA ACAATCAAAC CACGCGTTGA AGGGGGGCGT ICtCATTCAA GTGAGGCAGA AAATCCTAAC CIAACIATIC 
AAACCAAAGA CTTAAAATTC CCAGGAGACC TAAATATTTC ACGCTTTAAT AAACCACAAA TTACAGCTAA AAATCGCAG, CATTTAACTA TTGGCAATCC 
«, TAGCG6TGGT AATGCTGATG CTAAAAAACT GACTTTTGAC AAGGUAAAG ATTCAAAAAT CTCGACrGAC CCTCACAATC TAACACTAAA TAGCGAAGTG 
330, AAAACCTCTA ArCCTACTAG CAATGCTGGT AATGATAACA CCACCGCTTT AACCAUTCC GCAAAAGATG TAACGGTAAA CAATAACGTT ACaCCCACA 
340, AGACAATAAA TATCTCTCCC GCAGCAGGAA ATGTAACAAC CAAAGAAGGC ACAACTATCA ATGCAACCAC ACGCAGCCTC CAAGTAACTC CTCAAAATCG 
3501 TACAATTAAA GCCAACATTA CCTCCCAAAA TGTAACAGT6 ACAGCAACAG AAAATCTTGT TACCACAGAB AATCCTGTCA TTAAT6CAAC CAGCGGCACA 
3*01 CTAAACATTA CTACAAAAAC ACCGGATATT AAAG6TGCAA TtCAATCAAC TTCCCGTAAI CTAAATAm CAGCGAGCGC CAATACACTT AAGSIAAGTA 
W ATATCACT66 TCAA8ATCTA ACAGTAACA6 CGGATGCAGG ACCCTTCACA ACUCAGCAG CCrCAACCAT TAGTGCGACA ACAGGCAATG CAAATATTAC 
^ AACCAAAACA GGTGATATCA ACOGTAAAGr TCAAICCACC TCCCGCTCTG TAACACTTGT T6CAACI0CA GCAACTCTTG CTGTAGGTAA 7ATTTCAGG1 
w , AACACTCTTA CTATTACTGC CGAIAGCGCT AAATTAACCT CCACAGTA6G TTCTACAATT AATCGGACTA ATAGTGTAAC CACCTCAAGC CAATCAGCCC 
«0, ATATTCAAGG TACAATTTCt CGTAATACAG IAAATGTTAC AGCAAGCACT GCTGATTTAA CfATTGGAAA TAGTGCAAAA GTTGAAGCGA AAAAISGAGC 
*,01 TGCAACCTTA ACTGCTGAAT CAGGCAAATT AACCACCCAA ACAGGCTCTA GCATTACCTC AAGCAATGGI CAGACAACTC TTACACCCAA GGATAGCAGT 
«0, ATCGCAGGAA ACATTAATCC TGCTAAT6T6 ACGTTAAATA CCACAGGGAC TTIAACTACT ACAGGGGATT CAAAGATTAA CCCAACCAGT GCTACCTTAA 
CO, CAATCAATGC AAAAGATGCC AAATTAGATG CTGCTGCATC AGGTGACCGC ACAGTAGTAA ATGGAACTAA CGCAAGTGGC TCTGGTAACG fGACTGGGAA 
uo , AACCTCAACC ACCSTGAATA ICACCGCGGA TTTAAACACA ATAAATGGGT TAAATATCAT TTCGGAAAAT GGTAGAAACA CTGTGCGCTT AAGACGCAAG 
A501 GAAATT6ATG TGAAATATAT CCAACCAGGT 6TAGCAAGCG TAGAAGAGGT AATTGAAGCG AAACGCGTCC TTGAGAAGGI AAAAGATTTA TCTGATGAA6 
W MAGAGAAAC ACTAGCCAAA CTTGGTGIAA GTGCtGlACG TTTCGTIGAG CCAAATAATG CCATIACGGI 1AAIACACAA AACGAG1TTA CAACCAAACC 

my atcaagtcaa gtgacaattt ctgaacgtaa cccgtctttc tcaagtggta atgccgcacg agtatctacc aatcttcctc acgatccaca GCAG 
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REFORMAT of: I«**.6cg check: -1 irvmz 1 to: 4*03 October 5, 1995 
(No docuMfttttfon) 

Kuwt.Gce te»«th: 4*03 October 5, 1995 tfl:29 Type: N Check: 3920 

1 ATCAACAAGA TATATCCTCT CAAATTCAGC AAACCCCTCA ATCCTTTCCT TCCrGTCTCT CAATTCACAC CCCGTTCTCA CCATTCCACA CAAAAACCCA 

10! CTCAAAAACC TGTTCGTACG AAAGTACCCC ACTTGGCGTT AAAGCCACTT TCCCCTATAT TCCTATCTTT GCGCATCCCA TCCATTCCGC AATCTCTTTT 

201 ACCGACCC6T TTACACCCAA TGAGCGTCGT ACACGGTACA CCAACCATGC AAGTACACGG CAATAAAACC AXTATCCCTA ATAGCGTCAA TCCIATCATC 

3oT AATT6CAAAC AATTTAACAT TCACCAAAAT CAAATCCTCC AGTTTTTACA AGAAAGCAGC AACTCTGCCG TTTTCAACCG TGTTACATCT CACCAAATCT 

401 CCCAATTAAA AGGGATTTTA GATTCTAACC GACAAGTCTT TTTAATCAAC CCAAATGGTA TCACAATACC TAAAGACCCA ATTATTAACA CTAATCGCTT 

SOI TACTCCTTCT ACGCTACACA TTTCTAACCA AAACATCAAC GCCCGTAATT TCACCCTTCA GCAAACCAAC GATAAACCAC rCCCTCAAAT CCTCAATCAC 

601 CGTTTAATTA CCCTTCCTAA AGACGGTA6C CTAAACCTTA TTCGTCCCAA ACTCAAAAAC GAGGGCGTCA TTACCCTAAA rCCCCCTAGT ATTTCTTTAC 

701 TTGCACGGCA AAAAATCACC ATCAGCCATA TAATAAATCC AACCATCACT TACACCATTC CTGCACCTGA AAACGAACCG ATCAATCTCG CCGATATTTT 

SOI TCCCAAACGT CCTAACATTA ATCTCCCCCC TCCCACTATT CGCAATAAAC GTAAACTTTC TCCCGACTCT CTAACCAAAC ATAAAACTCG TAACATTCTT 

901 CTCTCT60CA AAGAACGTGA ACGGGAAATT GCCCGTGTAA TTTCCCCTCA AAATCACCAA CCCAAACCTC CTAACTTCAr CATTACACGT CATAAACTCA 

1001 CATTAAAAAC ACCTCCACTT ATCCACCTTT CACGTAAAGA ACCGGGA&AG ACTTATCTTC GCGGTGATCA CCCTGCCGAA CCTAAAAATG GTATTCAATT 

1101 ftffHSMCftJM ACCTCTTTAG AAAAACCCTC GACAATTAAT GTATCAGCCA AACAAAAAGG CGGG CG CG C T ATTGTATCGG GCGATATTCC ATTAATTAAT 

1201 CCTAACATTA ATGCTCAAGG TACCGATATT GCTAAAACTG GCCCCTTTCT CGAAACATCA GGACATGACT TATCCATTGG TGATCATGTG ATTGTTGACG 

1301 CTAAAGAGTG GTTATTAGAC CCAGATGATG TGTCCATTGA AACTCTTACA TCTCGACGCA ATAATACCGG C GAAAA CC AA GGATATACAA CAGGAGATGG 

HOI GACTAAAGAG TCACCTAAAG GTAATACTAT TTCTAAACCT ACATTAACAA ACTCAACTCT TCACCAAATC CTAACAACAC CTTCTTATGT TAATATCACT 

1S01 CCTAATAATA GAATTTATCT TAATAGCTCC ATCAACTTAT CTAATC6CAG TTTAACACTT CACACTAAAC GACATCGAGT TAAAATTAAC GGTCATATTA 

1601 CCTCAAACCA AAATGGTAAT TTAACCATTA AACCAGGCTC TTGGGTTGAT GTTCATAAAA ACATCACGCT TGCTACGCCT TTTTTGAATA TTCTCSCTCG 

1701 GGATTCTGTA GCTTTTCAGA CAGAGGGCGA TAAAGCACGT AACGCAACAG ATGCTCAAAT TACOCCACAA GGGACGATAA CCGTCAATAA AGATGATAAA 

1001 CAATTTAGAT TCAATAATCT ATCTATTAAC GCGACGOfiTA AGGGTTTAAA GTTTATTGCA AATCAAAATA ATTTCACTCA TAAATTTGAT CCCCAAATTA 

1901 ACATATCTCG AATAOTAACA ATTAACCAAA CCACCAAAAA AGATGTTAAA TACTCCAATG CATCAAAACA CTCTTACTCG AATCTTTCTT CTCTTACTTT 

2001 GAATACGGTG CAAAAATTTA CCTTTATAAA ATTCCTTGAT AGCGGCTCAA ATTCCCAAGA TTTCAGGTCA TCACGTAGAA GTTTTGCAGG CGTACATTTT 

2101 AACCGCATCG WBBTMM* AAACTTCAAC ATCGGAGCTA ACCCAAAACC CTTATTTAAA TTAAAACCAA ACGCCCCTAC AGACCCAAAA AAAGAATTAC 

2201 CTATTACTTT TAACGCCAAC ATTACAGCTA CCCGTAACAG TCATACCTCT GTGATCTTTG ACATACACCC CAATCTTACC TCTACAGCTG CCGGCATAAA 

2301 CATCGATTCA ATTAACATTA CCGGCCC6CT TCACTTTTCC ATAACATCfX ATAATCCCAA TAGTAATCCT TTTCAAATCA AAAAACACTT AACTATAAAT 

2401 GCAACTCGCT CGAATTTTAG TCTTAAGCAA ACGAAAGATT CTTTTTATAA TGAATACAGC AAACACGCCA TTAACTCAAG TCATAATCTA ACCATTCTTG 
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250, cecccMtcr cactctaggt cgggaaaat, ou^ctac aatatcaata tcaccaaua — " — 

2M , wttCCMC AMCB ctiga tctaactctt ««« ctcttgaggg « ctaactcgtg caamgcaaa cmGrcGcc 

2T0 , wrcmC TA UGXAGAAGA TTCCACATTT AAAGGAGAAG CCAGTGACAA CCtAAACAK ACCGGCACCT TTACGAACAA CGCTACCG« 

2.0, TAAAACAAGG AGTCGTAAAA CCCAAGGCG AT.UATCAA TAAAGGTGGT TTAAATATCA CTACTAACGC CTCAGGCACT CAAAAAACCA TTAtrAACCG 

w aaatataact aaccaaaaag ccacrr^M catcaacaat atuaacccc acgccgaaat ccaaattggc calmer cacaaaaaga aggc»atctc 

„0, ACTTTCn CTCATAAAGT AAATATTACC AATCAGATAA CAATCAAAGC AGGCGTTGAA GGGGGGCGTT CTGATTCAAC TGAGGCAGAA AATGCTAACC 
310, TAACTATTCA AACCAAAGAC TTAAAATTCG UGGAGACCT ^ CGCTTTAATA AAGCAGAAAT TACAGCTAAA AATGGCAGTG ATTTAACTAT 

KO, tggcaatcct a^ogta atgctgatgc taaaaaagtg acttttgaca ^ — - ™«« 

act A^AAGTCA AMCOTCm TGGTAGTAGC AATGCTGGTA ATGATAACAG CACC6GTTTA ACCATTTCCG CAAAAGATGT AACCGTAAAC AATAAC1STTA 
,40, CaCCCACAA CACAATAAAT ATCTCTGCCG CAGCAGGAAA TCTAACAACC AAAGAAGCCA CAACTATCAA TCCAACCACA CGCAGCGTGG AAGTAACTGC 
„0, TCAAAATC6T ACAATTAAAC 0CAACATTAC CTCGCAAAAT CTAACAGICA CACCAACACA AAATCTTCTT ACCACACACA ATCCTCTCAt TAATCCAACC 
«0, ACCBGCACAC TAAACAHAG TACAAAAACA 6GGCATATTA AACCT6CAAT TCAATCAACT TCCCCTAATG TAAATATTAC ACCGACCCCC AATACACTTA 
^ AGGTAAGTAA TATCACTGGT CAABAT6TAA CAGTAACAGC GGATGCAGGA CCCTTCACAA CTACAGCAGG CTCAACCATT ACTCCCACAA CACCCAATCC 
3^, aaatATTACA ACCAAAACAG GTCATATCAA CCGTAAWTT GAATCCACCT CCCGCTCTG, AACACTTGTT GCAACTGOAG CAACTCTTGC TGTAGCrAAT 
W ATTTCMSCTA ACACTCTTAC TATTACTBCS 6ATAGCCCTA AATTAACCTC CACAGTAGCT TCIACAATTA ATGG6ACTAA TACIGTAACC ACCTCAAGCC 
400, AAICACCCCA TATTGAAC6T ACAATTTCT6 6TAATACACT AAATCTTACA GCAAGCA«G CTCATTTAAC TATTGCAAAT AGTCCAAAAC UGAAGCUA 
4,0, AAATCGAGCT CCAACCTTAA CTGCTGAATC AGGCAAATTA ACCACCCAAA CAGGCTCTAG CATTACCTCA AGCAATBCTC ACACAACTCT TACAGCCAAG 
420, CATACCACTA TCCCACOAAA CATTAATCCT CCTMXCTCA CGTTAAATAC C^CCCACT TTAACTACTA CAGGGGATTC AAACATTAAC GCAAOCAGTG 
430, 6TACCTTAAC AATCAATGCA AAAGATSCCA AATTAGATCB TCCTGCATCA 6ST6ACCCCA CACTAGTAAA T6CAACTAAC GCAAGTCGCT CTCCTAACCT 
4*0, CACTCCGAAA ACCTCAAOCA CCBTGAAIAT CACCGGG6AT TTAAACACAA TAAATG66TT AAATAICATT TCCOAAAATC GTAGAAACAC TGTGCGCTTA 
450, AGAGGCAAGG AAATTGATGT GAAATATATC CAACCAGGTG TACCAACCCT AGAAGAG6TA ATIGAACCGA AACGCCTCCT TGAGAAGGTA AAAGATTTAT 
*«, CTGATGAAGA AACAGAAACA CTAOCCAAAC TTCGTGTAAS T6CT6TACGT TTCGTTGAGC CAAATAATGC CATTACCGTT AATACACAAA ACGAGTTTAC 
470, AACCAAACCA TCAACTCAAG TGACAATTTC TGAAGGTAAG GCGTGTTTCT CAAGTGGTAA TCGCGCACGA GTATGTACCA ATGTTGCTGA CGATGGACAG 
4801 ac 
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43 
HMW1 

HMW2 

II 

FIG. t. Western tmmunoblot assay of phage lysates containing 
either the HMW1 or HMW2 recombinant proteins. Lysates were 
probed with an £. co/Z-absorbed adult serum sample with high-liter 
antibody against high-molecular-weight proteins. The arrows indi- 
cate the major immunoreactive protein bands of 125 and 120 kDa in 
the HMWl and HMW2 lysates, respectively. 
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U I U t U I U I 
pT7-7 pHMWI-2 pHMW1-14 
pHMWI-4 

FIG. V. Western immunoblot assay of ^ sonicates prepared 
from E. coli transformed with plasmid pTW ' (lanes ljd^ 
SiWl.2 (lanes 3 and 4), pHMWl-4 (lanes 5 and 6) or pHMWl-14 
flanes 7 and 8). The sonicates were probed with an £ ^-absorbed 
27 mm sample with high-titer antibody against highmolecular- 
welg^ p^einsTanes labeled U and I represent sonicates preyed 
before and after induction of the growing samples ™th Ir- 
respectively. The arrows indicate protein bands of interest as 
described in the text. 
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FIG.^ft ELISA with rHMWl antiserum assayed against purified 
filamentous hemagglutinin of fl. pertussis. Ab, antibody. 
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FIG. /. Western immunoblot assay of cell sonicates from a panel 
of epidemiologically unrelated nontypeable H. influenzae strains. 
The sonicates were probed with rabbit antiserum prepared against 
HMW1-4 recombinant protein. The strain designations are indicated 
by the numbers below each lane. 
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13 



200K 



FIG !& Western immunoblot assay of cell sonicates from a panel 
of epidemiological^ unrelated nontypeable H. influenzae strains. 
The sonicates were probed with monoclonal antibody XJU a 
murine IgG antibody which recognizes the filamentous hemaggluti- 
nin of B. pertussis (13). The strain designations are indicated by the 
numbers below each lane. 



94K\ . 



TOE 

5 7 



12 14 15 16 17 18 



WO 97/36914 



PCT/US97/04707 



/ O 



12 3 4 

KDa \.^^r3Sj^^^ssgggg: 
200 • ^-^g^^lgs^^ 



116 



94 v" -/-"^r; 

67 • - 



43 

Fig. Jr. Immunoblot assay of cell sonicates of nontypable H. 
influenzae strain 12 derivatives. The sonicates were probed with 
rabbit antiserum "prepared" against HMW-1 recombinant protein. 
Lanes: 1. wild-type strain: 2, HMW-2" mutant; 3, HMW-1- mutant; 
4, HMW-1-/HMW-2- double mutant. 
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Western immunoblot assay with Mab AD6 and 
HMW1A or HMW2A recombinant proteins 
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Western immunoblot assay with Mab AD6 and 
ten unrelated nontypable Haemophilus influenzae 



kDa 
200 




94 

67 • 

1 2 3 4 5 6 7 89 TO 



Figure 5 ? 3 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US97/04707 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC(6) C07H 21/02. 21/04; C12P 21/06; A61K 39/102 

USCL :536/23.1. 23.4. 23.7,24.3, 24.33; 435/69.1; 424/256.1 
According to International Patent Classification (IPC) or to hoth national classification and IPC 



B. FIFXDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 536/23.1, 23.4. 23.7, 24.3, 24.33; 435/69.1; 424/256,1 



Documentation searched other than minimum 



documentation to the extent that such documents arc included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 

APS, DIALOG, CAS, MEDLINE, BIOSIS.-MPSRCH _ . . .. _ . _ _ _ 

search terms: haemophilus influenzae, h. influenzae, high molecular weight, hmw 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



WO 93/19090 A1 (BARENKAMP) 30 September 1993, 
entire document. 

BARENKAMP et al. Cloning, Expression, and DNA Sequence 
Analysis of Genes Encoding Nontypeable Haemophilus 
influenzae High-Molecular-Weight Surface-Exposed Proteins 
Related to Filamentous Hemagglutinin of Bordetelia pertussis. 
Infection and Immunity. April 1992, Volume 60, No. 4, 
pages 1302-1313, entire document. 

WO 94/21290 A1 (BARENKAMP) 29 September 1994, 
entire document. 



1- 4 

2- 4 
1 



1-4 



[x] Further documents arc feted in the continuation of Box C. Q See patent family annex 



•p. 



Special categories of ciiod do cu ment*; 

document dcfiniiif Ibe |mni state of the art which « not considered 
to be of a 



later documeai published after fee intsrnaooaal Filing d**e or priority 
dale wad not id conflict with the application but cited to understand Ibe 
principle or theory underrymj the rovcnUon 



ear ber documeai pubbabed on or after die international filing date 

dfmTim* which may throw doubts on priority claiiD(t) or which m 
cited to eetabiiah the publication date of another ciutioo or otbcr 
ra <m ipecifkd) 



-X* 



dttfwnfE* of particular relevance: the claimed bweotion cannot be 
consaJcred novel or cannot be considered to involve an inventive ttep 
when the document m taken alone 



document referrmc to an oral diackjanre. uae. exhibition or otbcr 

dccua^fmWkawd prior to tbc mu*»«k^ filiii| date but a^ lb*n 
the priority data ckimed 



documeat of pnmcunu relevance: the claimed invention cannot be 
coaattercd to involve aa krventivc Hep when the document M 
combined with one or more other auch document!, auoh coattbaauon 
being obvious to a peraon tktikd in the art 

document member of the aame patent family 



Date of the actual completion of the international search 
14 MAY 1997 



Date of mailing of the international search report 



1 0 M 1997 



Name and mailing address of the ISA/US 
Commiuioner of Patents and Trademarks 
Box PCT 

Washington, D.C. 20231 
Facsimile No. (703) 305-3230 



Authorized officer 

JENNIFER SHAVER 
Telephone No. (703) 308-0196 



Form PCT/1SA/210 (second ihcet)(July 1992)* 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US97/04707 



C (Continuation ). DOCUMENTS CONSIDERED TO BE RELEVANT 

Citation of d ocument, with indication, where appropriate, of the relevant pa.sages 

BARENKAMP et al. Genes Encoding High-Molecular- Weight 



Category 
X 



Adhesion Proteins of Nontypeable Haemophilus influenzae Are 
Part of Gene Clusters. Infection and Immunity. August 1994, 
Volume 62, No. 8, pages 3320-3328, entire document. 



Relevant to claim No. 



2-4 



Form PCT/lSA/210 (continuauon of second shoctXJuly 1992)* 



INTERNATIONAL SEARCH REPORT 



International application No 
PCT/US97/04707 



This international report has not been established in respect of certain 



Bo, I Observations where certain claims were found unsearchable (Continuation of item 1 of first sheet! 



claims under Article I7(2)[a) for the following reasons: 



□ 



Claims Nos.: 
because they 



relate to subject matter not required to be searched by this Authority, namely: 



2. 



|~~| Claims Nos.: 



b^usetteyrelautoparuofthcm^^ 

an extent that no meaningful international search can be carried out, spectficUy: 



3 - ° ^Z-*^^*-"^**^^*^"™^"™""' 

Box II Observations where unity of invention is lacking (Continuation of item 1 of tot sheet) 
This ,„tern.tional Searching Authority found multiple invention, in thi. intemaUona. application, as follows: 
Please See Extra Sheet. 



| | As ail required additional search 



fees were timely paid by the applicant, thi, international search report covers aU searchable 



claims 



, could be arched without effort justifying an additional fee. this Authority did not invite payment , 



2. As all searchable claims < 

of any additional fee. 

Manly someofth. quired addition^ ^ 

only those claims for which fee. were paid, specifically claims Nos. 



| search fee. were timely paid by the applicant. Consequently, this international search report is 



4. Fxl No required additional J , . 

L£J —>-^a ^ tK*> in^tinn first mentioned in the claims; it is covered by claims Nos.: 



restricted to the invention i 
1-4 



Remark on Protest Q The additional search fees were accompanied by the applicant's protest. 

Q No protest accompanied the payment of additional search fees. 

Form PCT/ISA/210 {continuation of first •hcct(l))(July 1992)* 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US97/04707 



BOX II. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

single inventive concept under PCT Rule u.i. in u™« 
search fees must be paid. 

Group 1, claim(s) 1-4, drawn to DNA and vectors. 
Group II. claim(«) 5-9. 12 and 13. drawn to protons. 

Group 111. cl.im(» 10 and U, drawn to conjugate "«*"^ iye concept under PCT Rule 13.1 because. 

The inventions listed as Groups Mil do not relate to smgie u. foUowing reasons: 

undcr PCT Rule 13.2. they Lck the same or corre.pon.u, g = . - ^ rf Hmm ^ m 

Tnc special technical feature of Group I ii DN A ■ encoding » gn con j UgB ics of Group HI as a .s 

1^.. ThisDNA is sepante and independent I.fZJo Group 1. is 5 molecular weight 

biologically. chemicaUy.and structural d.fferent. ^Sn-. Group B , as they are not linked to an _ 

proteL of Haemophilus influenza* wh.ch are separate P»l**« then the conjugate, of 

Ligen. hapten or polysaccharide. These pcpt.de* *ffe™ ~ 0 7 Gfoup „ and ml y be used as 
Group 111. The conjugates of Croup ... are iSerlTeneoding the proteins of Group ... 

different properties with no common link between them. 




Form PCTflSA/210 (extra sheel)(July 1992)* 



